Max number of files to iterate with os.listdir()

The official PYBD running MicroPython, and its accessories.
Target audience: Users with a PYBD
MrRobot
Posts: 31
Joined: Mon May 11, 2020 11:30 am

Max number of files to iterate with os.listdir()

Post by MrRobot » Fri Jun 19, 2020 11:13 am

So I'm making a data logger using my Pyboard D,

I need to store data on the SD card in folders.

From experimenting I've found that the Max files per folder is 32768 (2^15)

However I run into trouble when trying to index files using os.listdir() , i'm guessing this is due to limited memory on the Pyboard.

But the issue is the number of files I can index changes, there doesn't seem to be a constant limit.

Does anyone know the maximum files os.listdir() can index before throwing an error?

stijn
Posts: 735
Joined: Thu Apr 24, 2014 9:13 am

Re: Max number of files to iterate with os.listdir()

Post by stijn » Fri Jun 19, 2020 11:32 am

Would help if you explain your goal. Do you really need os.listdir()? As far as I know that returns a list so you need the memory to store all names. But if you do not need that list as a whole, which is not unlikely, you could use ilistdir() instead: that gives you an iterator so you can access each element one by one so it hardly uses any memory.
Does anyone know the maximum files os.listdir() can index before throwing an error?
That is impossible to answer: it's limited by memory, which is limited by your harware and software, so unless you have a progran which does exactly the same on each run the results will vary.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Max number of files to iterate with os.listdir()

Post by pythoncoder » Sun Jun 21, 2020 8:53 am

Do you really need a huge number of files? This is asking the filesystem to perform data lookup which gets unwieldy when the number of files becomes excessive.

There are usually better options which boil down to letting Python do the lookup. Use MicroPython's btree database. Or store an object such as a JSON encoded Python dict in a file. Or append lines of text to a single file.
Peter Hinch
Index to my micropython libraries.

chuckbook
Posts: 135
Joined: Fri Oct 30, 2015 11:55 pm

Re: Max number of files to iterate with os.listdir()

Post by chuckbook » Mon Jun 22, 2020 9:34 am

FAT filesystem has a terrible performance with more than some thousands of entries per directory.
Try to use a multi level hierarchy with less than 1000 entries per directory.

MrRobot
Posts: 31
Joined: Mon May 11, 2020 11:30 am

Re: Max number of files to iterate with os.listdir()

Post by MrRobot » Mon Jun 22, 2020 1:30 pm

pythoncoder wrote:
Sun Jun 21, 2020 8:53 am
Do you really need a huge number of files? This is asking the filesystem to perform data lookup which gets unwieldy when the number of files becomes excessive.

There are usually better options which boil down to letting Python do the lookup. Use MicroPython's btree database. Or store an object such as a JSON encoded Python dict in a file. Or append lines of text to a single file.
Well I'm transmitting data to my server using the Pyboard and a SIM module. So I really just need a way to backup my data on the SD card in the event it can't send the data.

I'll look into the btree module, is there any limit to the amount of entries that can be stored in it?

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Max number of files to iterate with os.listdir()

Post by pythoncoder » Mon Jun 22, 2020 3:56 pm

I don't think there's any limit other than available space for the file. The btree database behaves like a persistent dictionary. From your description items can be removed once they have been sent, so excessive growth sounds unlikely.
Peter Hinch
Index to my micropython libraries.

MrRobot
Posts: 31
Joined: Mon May 11, 2020 11:30 am

Re: Max number of files to iterate with os.listdir()

Post by MrRobot » Tue Jun 23, 2020 9:10 am

pythoncoder wrote:
Mon Jun 22, 2020 3:56 pm
I don't think there's any limit other than available space for the file. The btree database behaves like a persistent dictionary. From your description items can be removed once they have been sent, so excessive growth sounds unlikely.
Hi I'm using the latest daily build for my Pyboard D and I'm unable to import the btree module

Code: Select all

>> import btree
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: no module named 'btree'

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Grrr.

Post by pythoncoder » Tue Jun 23, 2020 11:16 am

Sorry - I find all these configuration options extremely confusing. The docs indicate that btree exists, but probing around in the source tree suggests that it needs to be explicitly specified in the build in mpconfigbord.mk:

Code: Select all

# btree module using Berkeley DB 1.xx
MICROPY_PY_BTREE = 1
For reasons I can't even guess at, the module is compiled in the Unix build but not (AFAICS) in network-capable hardware builds.

Another option is to maintain a Python dict, and make it persist using ujson.
Peter Hinch
Index to my micropython libraries.

MrRobot
Posts: 31
Joined: Mon May 11, 2020 11:30 am

Re: Max number of files to iterate with os.listdir()

Post by MrRobot » Tue Jun 23, 2020 11:45 am

Ah that's a shame.

Regrading your idea of a dict maintained by ujson, wouldn't I need to load this dict every time I want to store more data.
Wouldn't this have the same bottleneck with the 2MB on board Memory?

Do you have a code snippet as an example?

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Max number of files to iterate with os.listdir()

Post by pythoncoder » Wed Jun 24, 2020 10:31 am

Here is the kind of thing I have in mind. In this instance the queue of outstanding items is stored as a list.

Code: Select all

def log_data(filename):
    try:
        with open(filename, 'r') as f:  # Resume after power failure
            q = ujson.load(f)
        q_changed = False
    except OSError:  # First time run: no file yet created
        q = []
        q_changed = True  # Force a write
    while True:
        d = get_data()  # return some kind of object
        if link_is_open():  # Check communication status
            for item in q:
                send(item)  # Transmit the object
            q = []  # queue is now empty
            q_changed = True
            send(d)
        else:
            q.append(d)
            q_changed = True
        if q_changed:
            q_changed = False
            with open(filename, 'w') as f:
                ujson.dump(q, f)  # in case of power failure
        time.sleep(10)  # However long you want to wait between samples
Note I haven't actually tested this ;)
Peter Hinch
Index to my micropython libraries.

Post Reply