Using persisting bytes array efficientlly

rosenrot · Post by **rosenrot** » Fri Jun 02, 2017 7:47 pm

My scenario is as follows: I do have to change an array of 1000 bytes from different functions(not in parallel). I found out, that Depending on how I do it, I run into MemoryErrors. What is the most efficient way to do the following:

create bytes(1000)
func1 change 500 of these bytes, send all 1000 to func2
func1 change 800 of these bytes, send all 1000 to func2

This is what I do right now:
global ARR=bytes(1000)
func1:
...global ARR
...while True:
......ARR = bytes([1,2])*500
......func2(ARR)
.........
Is this the way to go? I would like this ARR to stay in the memory all the time to avoid for fragmentation reasons.

My requirements:
Changing this 500 bytes should be as fast as possible.

Beta_Ravener · Post by **Beta_Ravener** » Fri Jun 02, 2017 8:36 pm

Problem with bytes is, that it is immutable collection: https://docs.python.org/3/library/funct ... func-bytes. What that means is that part of it can't be changed, you have to create new object that will reflect those changes. If you, for example, changed single byte in bytes of length 1000, you'd have to allocate new bytes object of length 1000. That's why you get into problems with memory, ESP does not have much. It's true that as soon as you lose reference to the first object it gets scheduled for garbage collection, but that may not happen before you use up all your memory.

As the python doc says, the bytearray (funny that you actually got it right in the title but not the code) is actually mutable, so with that one you should be able to do most operations without allocating new memory.

rosenrot · Post by **rosenrot** » Fri Jun 02, 2017 8:53 pm

What is the fastest way to fill the bytearray?

So far I did:

ARR = bytes([1,2])*500 which allocates 1000 bytes

When I do the same thing with the bytearray

ARR = bytearray([1,2]*500) it allocates 4000 bytes. It seems to allocate integers.

Actually:

ARR = bytearray(1000) allocates 1000 bytes
ARR = bytearray([1]*1000) allocates 4000 bytes

Beta_Ravener · Post by **Beta_Ravener** » Fri Jun 02, 2017 9:12 pm

I don't have device around to play with, but I guess it depends how it's implemented in micropython. First of all, [1,2]*500 is not really a good idea because that call creates temporary list that is then converted to bytesarray. Probably because the constructor sees list of integers, it unpacks every value as 4 bytes (whereas bytes constructor seems to have different behavior). Anyway, you are still wasting a lot of memory creating that list.

I'd go with the bytearray(1000) constructor and use index with [] operator to make any changes. It might be less pythonic but when you are dealing with large pieces of memory you just have to think what it's going to do. Same with strings, doing concatenation with + operator (for example building HTTP request) will cause a lot of memory allocations, because strings are also immutable, and you will fragment your memory pretty fast.

rosenrot · Post by **rosenrot** » Fri Jun 02, 2017 9:50 pm

Beta_Ravener wrote:I don't have device around to play with, but I guess it depends how it's implemented in micropython. First of all, [1,2]*500 is not really a good idea because that call creates temporary list that is then converted to bytesarray. Probably because the constructor sees list of integers, it unpacks every value as 4 bytes (whereas bytes constructor seems to have different behavior). Anyway, you are still wasting a lot of memory creating that list.

I'd go with the bytearray(1000) constructor and use index with [] operator to make any changes. It might be less pythonic but when you are dealing with large pieces of memory you just have to think what it's going to do. Same with strings, doing concatenation with + operator (for example building HTTP request) will cause a lot of memory allocations, because strings are also immutable, and you will fragment your memory pretty fast.

Of course, I see that iterating and setting values one by one would be memory efficient. However, it increases the time to do it by a factor of about 6.

ARR = bytearray(1000)
ARR = bytes([1,2])*500 takes 10ms
for...0:1000:2
...ARR[0:2] = bytes([1,2]) takes 60ms

Is there a option not allocating so much memory but still doing this operation in a fast way? Of course I can find a tradeof my changing the bytes() size to len(ARR)/2 or something similar.

I'm wondering if someone comes up with a better idea.

Beta_Ravener · Post by **Beta_Ravener** » Fri Jun 02, 2017 10:34 pm

Well, now you're creating 500 temporary lists, and for each its bytes representation. Did you try something really simple like:

arr = bytearray(1000)
for i in range(0, 1000):
...arr = (i%2)+1

rosenrot · Post by **rosenrot** » Sat Jun 03, 2017 5:36 am

Beta_Ravener wrote:Well, now you're creating 500 temporary lists, and for each its bytes representation. Did you try something really simple like:

arr = bytearray(1000)
for i in range(0, 1000):
...arr = (i%2)+1

I know but doing it with the loop increases the computation time by a factor of 6. As I wrote above, filling the array takes then 60ms.

pythoncoder · Post by **pythoncoder** » Sat Jun 03, 2017 6:30 am

@Beta_Ravener The ESP8266 can be astonishingly slow. On the Pyboard @rosenrot's code runs in about 13ms. I can't even get the times you get on the reference board without upping the clock to 160MHz. Then I'm seeing 75ms.

The following does it in 11.7ms at 80MHz, 6.7ms at 160MHz.

Code: Select all

from utime import ticks_us, ticks_diff
from micropython import const
_SIZE = const(20)
arr = bytearray(1000)
t = ticks_us()  # Start timing
at = bytearray(_SIZE)  # Temporary array
for x in range(_SIZE):
    at[x] = (x & 1) + 1

m = memoryview(arr)
for x in range(0, 1000, _SIZE):
    m[x: x + _SIZE] = at
print(ticks_diff(ticks_us(), t))

From quick tests a temporary array size of 20 seems approximately optimal, but don't ask me why

rosenrot · Post by **rosenrot** » Sat Jun 03, 2017 8:04 am

@pythencoder thanks for this example. For the record, I'm @160MHz already. I was thinking one could find a trade-of between the for loop and a certain temporary array size.

However, I thought about moving this task to assembler. It is a long time ago that I used asm but I came up with the following for filling my array with 4byte fragments.

Code: Select all

@micropython.asm_thumb
def padd(r0, r1, r2):
    label(LOOP)
    vldr(s0, [r2, 0])
    vstr(s0, [r0, 0])
    add(r0, 4)
    sub(r1, 1)
    bgt(LOOP)

Could I place this code within a random python file or does it have to be compiled as mpy?

However, both methods fail with an "invalid micropython decorator". Is there something I have to do to allow asm code?

https://micropython.org/resources/docs/ ... _tips.html

I can not find any information about adjusting anything for using it.

OK, asm_thumb2 seems not to be available for the esp8266. So I have to go with viper?

**EDIT** So far it seems viper is not supported in frozen modules. I went for my own C function but the tutorial from http://micropython-dev-docs.readthedocs ... g%20module fails with the following error:

Code: Select all

CC mymodule.c
mymodule.c:16:5: error: unknown field 'name' specified in initializer
     .name = MP_QSTR_mymodule,
     ^
mymodule.c:16:5: error: initialization makes pointer from integer without a cast [-Werror]
mymodule.c:16:5: error: (near initialization for 'mp_module_mymodule.globals') [-Werror]
cc1: all warnings being treated as errors
../py/mkrules.mk:47: recipe for target 'build/mymodule.o' failed
make: *** [build/mymodule.o] Error 1

**EDIT**

Code: Select all

.name = MP_QSTR_mymodule,

is deprecated. Removing it solved the issue. Now I try to pass a bytearray to the c function.

pythoncoder · Post by **pythoncoder** » Sun Jun 04, 2017 7:45 am

rosenrot wrote:...OK, asm_thumb2 seems not to be available for the esp8266...

Because that decorator is for ARM processors supporting the ARM Thumb machine code. The ESP8266 uses a different architecture and hence a different assembler. The decorator is

Code: Select all

@micropython.asm_xtensa

This thread offers some guidance viewtopic.php?f=16&t=3238&p=18999&hilit ... ler#p18999. As far as I know the subset supported by MicroPython is not yet documented and you may need to study the sourcecode to establish which instructions are supported.

As you have discovered frozen bytecode can include only Python bytecode, not machine code. The solution is to put your assembler, Viper or Native code in a separate small module and store it in the filesystem.

MicroPython Forum (Archive)

Using persisting bytes array efficientlly

Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly

Re: Using persisting bytes array efficientlly