fastest way to fill large bytearrays? (Neopixel, APA102 et al)
fastest way to fill large bytearrays? (Neopixel, APA102 et al)
So i'm working on a project with 120 APA102 LEDs and it's insanely slow. Same goes for Neopixels. The problem isn't the writing itself (this takes about 2ms) but rather the filling / overwriting of the bytearray (which takes about 100ms for the 120*4 Values).
Any pointers on how i can speed this up?
Any pointers on how i can speed this up?
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
Update: This takes 42ms to run. is there any option to speed it up?
Code: Select all
import utime
ba = bytearray(120*4)
start = utime.ticks_us()
for i in range(120):
ba[i*4+0] = 255
ba[i*4+1] = 255
ba[i*4+2] = 255
ba[i*4+3] = 255
end = utime.ticks_us()
print("Total time: "+str(end-start))
-
- Posts: 169
- Joined: Fri Aug 19, 2016 11:55 am
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
First, if this code wasn’t in a function, put it in one: local variables are faster than global ones.
This takes 9 ms on my ESP8266:
This is my first guess (get rid of those redundant multiplications) and takes 1.6 5.8 ms (edited: loop to 480, not 120):
For more tips, see Damien’s talk on Writing Fast and Efficient MicroPython (from the PyCon AU topic).
This takes 9 ms on my ESP8266:
Code: Select all
def run():
ba = bytearray(120*4)
start = utime.ticks_us()
for i in range(120):
ba[i*4+0] = 255
ba[i*4+1] = 255
ba[i*4+2] = 255
ba[i*4+3] = 255
end = utime.ticks_us()
print("Total time: "+str(end-start))
Code: Select all
def run():
ba = bytearray(120*4)
start = utime.ticks_us()
i = 0
while i < 480:
ba[i] = 255
i += 1
ba[i] = 255
i += 1
ba[i] = 255
i += 1
ba[i] = 255
i += 1
end = utime.ticks_us()
print("Total time: "+str(end-start))
Last edited by Christian Walther on Thu Mar 07, 2019 6:22 pm, edited 1 time in total.
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
I think your while loop should be while i < 480. since 480 is the size of the bytearray. You want to do 120 iterations of the while loop, but it increments i by 4 for each iteration.
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
Also, if you are using APA102's you might want to take a look at my micropython-dotstar library. As discussed in this forum post, @bill-e was able to update 3000 APA102's in about a second using it. Hope that helps!
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
thanks, the problem isn't the updating, it's the writing of the values...mattyt wrote: ↑Thu Mar 07, 2019 12:26 amAlso, if you are using APA102's you might want to take a look at my micropython-dotstar library. As discussed in this forum post, @bill-e was able to update 3000 APA102's in about a second using it. Hope that helps!
-
- Posts: 169
- Joined: Fri Aug 19, 2016 11:55 am
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
Doh! Thanks, Dave. That’s what I get for trying to solve technical questions with the mushy brain that comes from a cold. With the correct loop bounds, it takes 5.8 ms.
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
I did the following tests on a pyboard 1.0: and got these results:
Code: Select all
import utime
ba = bytearray(120*4)
start = utime.ticks_us()
for i in range(120):
ba[i*4+0] = 255
ba[i*4+1] = 255
ba[i*4+2] = 255
ba[i*4+3] = 255
end = utime.ticks_us()
print('Total time: {:4d} straight python'.format(end - start))
@micropython.native
def fill_native(ba):
for i in range(len(ba)):
ba[i] = 255
start = utime.ticks_us()
ba2 = bytearray(120*4)
fill_native(ba2)
end = utime.ticks_us()
print('Total time: {:4d} native emitter'.format(end - start))
@micropython.viper
def fill_viper(ba: ptr8, ba_len: int):
for i in range(ba_len):
ba[i] = 255
start = utime.ticks_us()
ba3 = bytearray(120*4)
fill_viper(ba3, len(ba3))
end = utime.ticks_us()
print('Total time: {:4d} viper emitter - 1 byte at a time'.format(end - start))
@micropython.viper
def fill_viper4(ba: ptr32, ba_len: int):
for i in range(ba_len):
ba[i] = -1
start = utime.ticks_us()
ba4 = bytearray(120*4)
fill_viper4(ba4, len(ba4)//4)
end = utime.ticks_us()
print('Total time: {:4d} viper emitter - 4 bytes at a time'.format(end - start))
@micropython.asm_thumb
def fill_asm(r0, r1): # buf(r0) len(r1)
mov(r2, 0xff)
add(r1, r1, r0) # buf_end(r1) = len(r1) + buf(r0)
label(loop)
cmp(r0, r1)
bge(endloop) # branch if buf(r0) >= buf_end(r1)
strb(r2, [r0, 0]) # *buf++ = 0xff
add(r0, 1)
b(loop)
label(endloop)
start = utime.ticks_us()
ba5 = bytearray(120*4)
fill_asm(ba5, len(ba5))
end = utime.ticks_us()
print('Total time: {:4d} asm - 1 byte at a time'.format(end - start))
@micropython.asm_thumb
def fill_asm4(r0, r1): # buf(r0) len(r1)
movw(r2, 0xffff)
movt(r2, 0xffff)
add(r1, r1, r0) # buf_end(r1) = len(r1) + buf(r0)
label(loop)
cmp(r0, r1)
bge(endloop) # branch if buf(r0) >= buf_end(r1)
str(r2, [r0, 0]) # *buf++ = 0xffffffff
add(r0, 4)
b(loop)
label(endloop)
start = utime.ticks_us()
ba6 = bytearray(120*4)
fill_asm4(ba6, len(ba6))
end = utime.ticks_us()
print('Total time: {:4d} asm - 4 bytes at a time'.format(end - start))
Code: Select all
>>> import test
Total time: 4750 straight python
Total time: 1579 native emitter
Total time: 229 viper emitter - 1 byte at a time
Total time: 135 viper emitter - 4 bytes at a time
Total time: 121 asm - 1 byte at a time
Total time: 82 asm - 4 bytes at a time
- rcolistete
- Posts: 352
- Joined: Thu Dec 31, 2015 3:12 pm
- Location: Brazil
- Contact:
Re: fastest way to fill large bytearrays? (Neopixel, APA102 et al)
On Pyboard D (SF2W), "MicroPython v1.9.4-925-g8edf1205f-dirty on 2019-01-16; PYBD_SF2 with STM32F722IEK".
Code of post #2 : 4.0 ms (120 MHz) / 2.4 ms (216 MHz) instead of 42 ms (? board).
Code of post #3, 1st code : 2.5 ms (120 MHz) / 1.5 ms (216 MHz) instead of 9 ms (ESP8266).
Code of dhylands's test @ 120 MHz :
@216 MHz :
So Pyboard D is :
- a lot faster than ESP8266;
- @ 120 MHz a little bit faster than Pyboard v1.0/1.1 in almost all tests, @ 216 MHz a lot faster than Pyboard v1.0/1.1.
Incredible to have native, viper and asm decorators already working on Pyboard D, with high performance !
Code of post #2 : 4.0 ms (120 MHz) / 2.4 ms (216 MHz) instead of 42 ms (? board).
Code of post #3, 1st code : 2.5 ms (120 MHz) / 1.5 ms (216 MHz) instead of 9 ms (ESP8266).
Code of dhylands's test @ 120 MHz :
Code: Select all
>>> import test
Total time: 4289 straight python
Total time: 1287 native emitter
Total time: 144 viper emitter - 1 byte at a time
Total time: 126 viper emitter - 4 bytes at a time
Total time: 96 asm - 1 byte at a time
Total time: 84 asm - 4 bytes at a time
Code: Select all
>>> machine.freq(216000000)
>>> import test
Total time: 2471 straight python
Total time: 739 native emitter
Total time: 99 viper emitter - 1 byte at a time
Total time: 79 viper emitter - 4 bytes at a time
Total time: 61 asm - 1 byte at a time
Total time: 48 asm - 4 bytes at a time
- a lot faster than ESP8266;
- @ 120 MHz a little bit faster than Pyboard v1.0/1.1 in almost all tests, @ 216 MHz a lot faster than Pyboard v1.0/1.1.
Incredible to have native, viper and asm decorators already working on Pyboard D, with high performance !
My "MicroPython Samples". My "MicroPython Firmwares" with many options (double precision, ulab, etc).
- pythoncoder
- Posts: 5956
- Joined: Fri Jul 18, 2014 8:01 am
- Location: UK
- Contact:
Viper performance
The other take-away from these tests is the performance of Viper. A penalty relative to assembler of well under 2 is astounding. Perhaps the figures for the faster options are dominated by a fixed overhead in doing a function call.
Peter Hinch
Index to my micropython libraries.
Index to my micropython libraries.