large file support for 32-bit Embedded Linux - seek fails after 2GB

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
c_toth
Posts: 3
Joined: Tue Oct 02, 2018 2:24 pm

large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by c_toth » Tue Oct 02, 2018 2:48 pm

I recently ran into a problem where the .seek() fails after 2147483647 (2GB) on Micropython 1.9.4 cross compiled for ARM intended for 32-bit embedded Linux. Script was tested using qemu-arm -cpu any

The file / hard drive addresses I have to access become bigger than the seek can handle and result in the following error:
Traceback (most recent call last):
File "hd_test.py", line 22, in <module>
OverflowError: overflow converting long int to machine word


Here's my code: (script takes one argument - full file / hard drive path)

#!/bin/env/python

import sys
import ubinascii

drive = sys.argv[1] #Complete path to the target HD
print("DEBUG - Max Int Size:",sys.maxsize) # this is the max native integer type for the platform we are on.

try:
f = open(drive, mode="rb+")
except:
print("Error opening target for READ / WRITE.")
sys.exit()


f.seek(2147483647)
print(f.read(8))

f.seek(147483600,1) #1 = seek from currrent position
print(f.read(8))

f.seek(3147483647); #####Fails here: OverflowError: overflow converting long int to machine word <<<<<<<
data = f.read(4)

f.close();

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by jickster » Tue Oct 02, 2018 4:46 pm

c_toth wrote:
Tue Oct 02, 2018 2:48 pm
I recently ran into a problem where the .seek() fails after 2147483647 (2GB) on Micropython 1.9.4 cross compiled for ARM intended for 32-bit embedded Linux. Script was tested using qemu-arm -cpu any

The file / hard drive addresses I have to access become bigger than the seek can handle and result in the following error:
Traceback (most recent call last):
File "hd_test.py", line 22, in <module>
OverflowError: overflow converting long int to machine word


Here's my code: (script takes one argument - full file / hard drive path)

#!/bin/env/python

import sys
import ubinascii

drive = sys.argv[1] #Complete path to the target HD
print("DEBUG - Max Int Size:",sys.maxsize) # this is the max native integer type for the platform we are on.

try:
f = open(drive, mode="rb+")
except:
print("Error opening target for READ / WRITE.")
sys.exit()


f.seek(2147483647)
print(f.read(8))

f.seek(147483600,1) #1 = seek from currrent position
print(f.read(8))

f.seek(3147483647); #####Fails here: OverflowError: overflow converting long int to machine word <<<<<<<
data = f.read(4)

f.close();

This is a known issue

Code: Select all

STATIC mp_obj_t stream_seek(size_t n_args, const mp_obj_t *args) {
    struct mp_stream_seek_t seek_s;
    // TODO: Could be uint64
    seek_s.offset = mp_obj_get_int(args[1]);
    seek_s.whence = SEEK_SET;
The problem is that mp_obj_get_int() returns a mp_int_t which must be pointer size so if you compile as 32-bit, it'll be 32-bit.

mp_obj_get_int() translates from a Python integer, which can be arbitrarily large, to a machine word which can only be 32-bit if you compiled it as 32-bit because the ioctl function must be called which does not know of Python integer-objects.

c_toth
Posts: 3
Joined: Tue Oct 02, 2018 2:24 pm

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by c_toth » Tue Oct 02, 2018 4:50 pm

Is there any way around this?

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by jickster » Tue Oct 02, 2018 4:54 pm

c_toth wrote:
Tue Oct 02, 2018 4:50 pm
Is there any way around this?
Split up your seeks() larger than abs(2^31-1) into multiple seeks.

c_toth
Posts: 3
Joined: Tue Oct 02, 2018 2:24 pm

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by c_toth » Tue Oct 02, 2018 5:01 pm

I already tried that and it worked for a while.. but then I got another error: (OSError: [Errno 22] EINVAL)


safeseek(96032801808,0)
print(f.read(16))

Debugging output:

Debug Seek from Start: 2147483647 x 44 + 1543556188
DEBUG-FromStartPos - Jump 2147483647
DEBUG-FromStartPos - Jump 4294967294
DEBUG-FromStartPos - Jump 6442450941
DEBUG-FromStartPos - Jump 8589934588
DEBUG-FromStartPos - Jump 10737418235
DEBUG-FromStartPos - Jump 12884901882
DEBUG-FromStartPos - Jump 15032385529
DEBUG-FromStartPos - Jump 17179869176
DEBUG-FromStartPos - Jump 19327352823
DEBUG-FromStartPos - Jump 21474836470
DEBUG-FromStartPos - Jump 23622320117
DEBUG-FromStartPos - Jump 25769803764
DEBUG-FromStartPos - Jump 27917287411
DEBUG-FromStartPos - Jump 30064771058
DEBUG-FromStartPos - Jump 32212254705
DEBUG-FromStartPos - Jump 34359738352
DEBUG-FromStartPos - Jump 36507221999
DEBUG-FromStartPos - Jump 38654705646
DEBUG-FromStartPos - Jump 40802189293
DEBUG-FromStartPos - Jump 42949672940
DEBUG-FromStartPos - Jump 45097156587
DEBUG-FromStartPos - Jump 47244640234
DEBUG-FromStartPos - Jump 49392123881
DEBUG-FromStartPos - Jump 51539607528
DEBUG-FromStartPos - Jump 53687091175
DEBUG-FromStartPos - Jump 55834574822
DEBUG-FromStartPos - Jump 57982058469
DEBUG-FromStartPos - Jump 60129542116
DEBUG-FromStartPos - Jump 62277025763
DEBUG-FromStartPos - Jump 64424509410
DEBUG-FromStartPos - Jump 66571993057
DEBUG-FromStartPos - Jump 68719476704
DEBUG-FromStartPos - Jump 70866960351
DEBUG-FromStartPos - Jump 73014443998
DEBUG-FromStartPos - Jump 75161927645
DEBUG-FromStartPos - Jump 77309411292
DEBUG-FromStartPos - Jump 79456894939
DEBUG-FromStartPos - Jump 81604378586
DEBUG-FromStartPos - Jump 83751862233
DEBUG-FromStartPos - Jump 85899345880
DEBUG-FromStartPos - Jump 88046829527
DEBUG-FromStartPos - Jump 90194313174
DEBUG-FromStartPos - Jump 92341796821
DEBUG-FromStartPos - Jump 94489280468
Traceback (most recent call last):
File "hdtest.py", line 150, in <module>
File "hdtest.py", line 72, in safeseek
OSError: [Errno 22] EINVAL

pfalcon
Posts: 1155
Joined: Fri Feb 28, 2014 2:05 pm

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by pfalcon » Tue Oct 02, 2018 5:12 pm

c_toth wrote:
Tue Oct 02, 2018 2:48 pm
I recently ran into a problem where the .seek() fails after 2147483647 (2GB) on Micropython 1.9.4 cross
Yeah, I'm thinking about it from time to time, and figure that lately my assumed reply for this issue is that "for 64bit offsets we have 64-bit architecture support in MicroPython". Feel free to describe your usecase, but so far MicroPython's approach was that "for 32-bit systems, 4GB should be enough for everyone, for rest, we have 64-bit support". Changing that would involved nice, careful (as usual!) patches to introduce optional 64-bit stream offset support. And mind that such support depends on kernel, so wouldn't have to work on a random device (and if your choice is not random, it again perhaps could be towards a 64-bit system).
Awesome MicroPython list
Pycopy - A better MicroPython https://github.com/pfalcon/micropython
MicroPython standard library for all ports and forks - https://github.com/pfalcon/micropython-lib
More up to date docs - http://pycopy.readthedocs.io/

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by jickster » Tue Oct 02, 2018 5:17 pm

pfalcon wrote:
c_toth wrote:
Tue Oct 02, 2018 2:48 pm
I recently ran into a problem where the .seek() fails after 2147483647 (2GB) on Micropython 1.9.4 cross
Yeah, I'm thinking about it from time to time, and figure that lately my assumed reply for this issue is that "for 64bit offsets we have 64-bit architecture support in MicroPython". Feel free to describe your usecase, but so far MicroPython's approach was that "for 32-bit systems, 4GB should be enough for everyone, for rest, we have 64-bit support". Changing that would involved nice, careful (as usual!) patches to introduce optional 64-bit stream offset support. And mind that such support depends on kernel, so wouldn't have to work on a random device (and if your choice is not random, it again perhaps could be towards a 64-bit system).

The binary architecture has nothing to do with the size of the flash drive you use.


Sent from my iPhone using Tapatalk Pro

jickster
Posts: 629
Joined: Thu Sep 07, 2017 8:57 pm

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by jickster » Tue Oct 02, 2018 6:21 pm

c_toth wrote:
Tue Oct 02, 2018 5:01 pm
I already tried that and it worked for a while.. but then I got another error: (OSError: [Errno 22] EINVAL)


safeseek(96032801808,0)
print(f.read(16))

Debugging output:

Debug Seek from Start: 2147483647 x 44 + 1543556188
DEBUG-FromStartPos - Jump 2147483647
DEBUG-FromStartPos - Jump 4294967294
DEBUG-FromStartPos - Jump 6442450941
DEBUG-FromStartPos - Jump 8589934588
DEBUG-FromStartPos - Jump 10737418235
DEBUG-FromStartPos - Jump 12884901882
DEBUG-FromStartPos - Jump 15032385529
DEBUG-FromStartPos - Jump 17179869176
DEBUG-FromStartPos - Jump 19327352823
DEBUG-FromStartPos - Jump 21474836470
DEBUG-FromStartPos - Jump 23622320117
DEBUG-FromStartPos - Jump 25769803764
DEBUG-FromStartPos - Jump 27917287411
DEBUG-FromStartPos - Jump 30064771058
DEBUG-FromStartPos - Jump 32212254705
DEBUG-FromStartPos - Jump 34359738352
DEBUG-FromStartPos - Jump 36507221999
DEBUG-FromStartPos - Jump 38654705646
DEBUG-FromStartPos - Jump 40802189293
DEBUG-FromStartPos - Jump 42949672940
DEBUG-FromStartPos - Jump 45097156587
DEBUG-FromStartPos - Jump 47244640234
DEBUG-FromStartPos - Jump 49392123881
DEBUG-FromStartPos - Jump 51539607528
DEBUG-FromStartPos - Jump 53687091175
DEBUG-FromStartPos - Jump 55834574822
DEBUG-FromStartPos - Jump 57982058469
DEBUG-FromStartPos - Jump 60129542116
DEBUG-FromStartPos - Jump 62277025763
DEBUG-FromStartPos - Jump 64424509410
DEBUG-FromStartPos - Jump 66571993057
DEBUG-FromStartPos - Jump 68719476704
DEBUG-FromStartPos - Jump 70866960351
DEBUG-FromStartPos - Jump 73014443998
DEBUG-FromStartPos - Jump 75161927645
DEBUG-FromStartPos - Jump 77309411292
DEBUG-FromStartPos - Jump 79456894939
DEBUG-FromStartPos - Jump 81604378586
DEBUG-FromStartPos - Jump 83751862233
DEBUG-FromStartPos - Jump 85899345880
DEBUG-FromStartPos - Jump 88046829527
DEBUG-FromStartPos - Jump 90194313174
DEBUG-FromStartPos - Jump 92341796821
DEBUG-FromStartPos - Jump 94489280468
Traceback (most recent call last):
File "hdtest.py", line 150, in <module>
File "hdtest.py", line 72, in safeseek
OSError: [Errno 22] EINVAL
That's a Linux error not micropython.

Post Reply