large file support for 32-bit Embedded Linux - seek fails after 2GB

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
alidayvn
Posts: 4
Joined: Fri Aug 30, 2019 8:34 am

large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by alidayvn » Wed Oct 09, 2019 9:38 am

I recently ran into a problem where the .seek() fails after 2147483647 (2GB) on Micropython 1.9.4 cross compiled for ARM intended for 32-bit embedded Linux. Script was tested using qemu-arm -cpu any

The file / hard drive addresses I have to access become bigger than the seek can handle and result in the following error:
Traceback (most recent call last):
File "hd_test.py", line 22, in <module>
OverflowError: overflow converting long int to machine word


Here's my code: (script takes one argument - full file / hard drive path)

#!/bin/env/python

import sys
import ubinascii

drive = sys.argv[1] #Complete path to the target HD
print("DEBUG - Max Int Size:",sys.maxsize) # this is the max native integer type for the platform we are on.

try:
f = open(drive, mode="rb+")
except:
print("Error opening target for READ / WRITE.")
sys.exit()


f.seek(2147483647)
print(f.read(8))

f.seek(147483600,1) #1 = seek from currrent position
print(f.read(8))

f.seek(3147483647); #####Fails here: OverflowError: overflow converting long int to machine word <<<<<<<
data = f.read(4)

f.close();

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by stijn » Wed Oct 09, 2019 10:28 am

Looks like MicroPython uses a 32bit argument for seek() when compiled for 32bit. That cuold be changed, but that only makes sense if the undrlying system cal (lseek() I tink) also supports that; do you know if that is the case for your particular implementation?

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by dhylands » Wed Oct 09, 2019 4:30 pm

lseek on 32-bit linux uses type off_t which winds up being long, which is 32-bits on a 32-bit arch.

Under 32-bit linux, you would use lseek64 to seek (absolutely) past the 2Gb mark. off64_t is a long long, which would be 64-bits on a 32-bit arch.

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by stijn » Wed Oct 09, 2019 5:55 pm

If lseek64 is generally available on both 32bit an 64bit architectures it might be good to just always use it for the unix port?

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: large file support for 32-bit Embedded Linux - seek fails after 2GB

Post by dhylands » Wed Oct 09, 2019 6:13 pm

Unfortunately, it isn't that simple.

This is the call to lseek: https://github.com/micropython/micropyt ... ile.c#L107
which uses the mp_stream_seek_t structure.

So you need to track down all of the users of that struct and then see what impact changing the size of mp_off_t would have.

I did notice that the comment says that if you're doing a SEEK_SET then the number should be treated as unsigned. Obviously, this isn't true or you'd be able to seek to positions between 2Gb and 4Gb. So you might want to investigate fixing that first.

Post Reply