How to combine bytearrays fast ?

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
fstengel
Posts: 55
Joined: Tue Apr 17, 2018 4:37 pm

Re: How to combine bytearrays fast ?

I just tried using the viper emitter on my ESP32. It seems that range(240) is strongly optimized. The following three functions

Code: Select all

``````@micropython.viper
def merge3(a:ptr8, b:ptr8, c:ptr8, d:ptr8, result:ptr8)->ptr8:
iteration_list = range(0,240)
for i in iteration_list:
j = int(i)
result[j] = a[j] + b[j] + c[j] + d[j]
return(result)

@micropython.viper
def merge4(a:ptr8, b:ptr8, c:ptr8, d:ptr8, result:ptr8)->ptr8:
for i in range(240):
j = int(i)
result[j] = a[j] + b[j] + c[j] + d[j]
return(result)

@micropython.viper
def merge5(a:ptr8, b:ptr8, c:ptr8, d:ptr8, result:ptr8)->ptr8:
j = int(0)
while j<240:
result[j] = a[j] + b[j] + c[j] + d[j]
j += 1
return(result)
``````
Run in roughly 890us, 195us and 190us. The time difference seems to be the time it takes to build the iterator and to use it. the direct for i in range(240): looks like it is a direct synonym of the while loop (i.e. using a counter, not an iterator)

jimmo
Posts: 1848
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: How to combine bytearrays fast ?

fstengel wrote:
Fri Feb 14, 2020 8:36 am
It seems that range(240) is strongly optimized.
Yup. It's specifically the "for i in range():" that is optimized. Calling range on its own behaves like normal. This applies to all emitters (bytecode, native, viper). See compile_for_stmt_optimised_range in compile.c
OutoftheBOTS_ wrote:
Fri Feb 14, 2020 6:53 am
Correct if I am wrong but it is the range(240) that takes up all the time because it doesn't create a list that it iterates...If you create a local list then it should run faster but use more memory.
range() (in Python 3) doesn't create a list -- it's a generator. (Back in Python 2, this was the difference between range() and xrange() -- This is why everyone had it drilled into them to use "for x in xrange()" in Python 2).

In general (for a generator function that isn't "range"), the following two snippets are identical:

Code: Select all

``````g = generator()
for i in g:
...
``````

Code: Select all

``````for i in generator():
...
``````
No list is created in either case so the memory usage is the same.

But, see above, range() is special, so the former snippet is actually much _worse_ when using range().
OutoftheBOTS_ wrote:
Fri Feb 14, 2020 6:53 am
through but rather is a global function that has to be called each iteration of the for loop.
The function isn't called for each iteration of the loop, rather each iteration pulls the next value out of the iterator. You're right that calling a global function is expensive, but advancing the iterator isn't so bad. (And again, this doesn't apply for range() ).

jimmo
Posts: 1848
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: How to combine bytearrays fast ?

If you're very curious, this is the optimisation in action:

Code: Select all

``````for i in range(10):
print(i)
``````
goes to

Code: Select all

``````00 LOAD_CONST_SMALL_INT 0
01 JUMP 27
04 DUP_TOP
05 STORE_NAME i
20 CALL_FUNCTION n=1 nkw=0
22 CALL_FUNCTION n=1 nkw=0
24 POP_TOP
27 DUP_TOP
29 BINARY_OP 0 __lt__
30 POP_JUMP_IF_TRUE 4
33 POP_TOP
35 RETURN_VALUE
``````

which is pretty close to what you get from

Code: Select all

``````i = 0
while i < 10:
print(i)
i += 1
``````
which becomes

Code: Select all

``````00 LOAD_CONST_SMALL_INT 0
01 STORE_NAME i
04 JUMP 33
19 CALL_FUNCTION n=1 nkw=0
21 CALL_FUNCTION n=1 nkw=0
23 POP_TOP
30 STORE_NAME i
38 BINARY_OP 0 __lt__
39 POP_JUMP_IF_TRUE 7
43 RETURN_VALUE
``````

OutoftheBOTS_
Posts: 790
Joined: Mon Nov 20, 2017 10:18 am

Re: How to combine bytearrays fast ?

jimmo wrote:
Fri Feb 14, 2020 10:33 am
If you're very curious, this is the optimisation in action:

Code: Select all

``````for i in range(10):
print(i)
``````
goes to

Code: Select all

``````00 LOAD_CONST_SMALL_INT 0
01 JUMP 27
04 DUP_TOP
05 STORE_NAME i
20 CALL_FUNCTION n=1 nkw=0
22 CALL_FUNCTION n=1 nkw=0
24 POP_TOP
27 DUP_TOP
29 BINARY_OP 0 __lt__
30 POP_JUMP_IF_TRUE 4
33 POP_TOP
35 RETURN_VALUE
``````

which is pretty close to what you get from

Code: Select all

``````i = 0
while i < 10:
print(i)
i += 1
``````
which becomes

Code: Select all

``````00 LOAD_CONST_SMALL_INT 0
01 STORE_NAME i
04 JUMP 33
19 CALL_FUNCTION n=1 nkw=0
21 CALL_FUNCTION n=1 nkw=0
23 POP_TOP
30 STORE_NAME i
38 BINARY_OP 0 __lt__
39 POP_JUMP_IF_TRUE 7
43 RETURN_VALUE
``````
Thanks for your detailed explanation as this helps me understand whats happening much better. I didn't know that range() had this level of optimization which of course is very useful because of how often it is used in most code.

Both sets of compiled instructions you posted seem to loop over 14 lines to get the job done so I assume we would expect them to both take the same amount of time to complete??

jimmo
Posts: 1848
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: How to combine bytearrays fast ?

OutoftheBOTS_ wrote:
Fri Feb 14, 2020 11:02 pm
Both sets of compiled instructions you posted seem to loop over 14 lines to get the job done so I assume we would expect them to both take the same amount of time to complete??
At a high level ... yes, opcode dispatch has enough overhead relative to the actual work done that this isn't a bad approximation for the simple opcodes like add and jump etc.

When you start getting into LOAD_ATTR and LOAD_SUBSCR and stuff there's potentially a lot of stuff happening behind the scenes.