ulab, or what you will - numpy on bare metal

C programming, build, interpreter/VM.
Target audience: MicroPython Developers.
User avatar
rcolistete
Posts: 265
Joined: Thu Dec 31, 2015 3:12 pm
Location: Brazil
Contact:

Re: ulab, or what you will - numpy on bare metal

Post by rcolistete » Thu Jul 16, 2020 12:19 am

rcolistete wrote:
Sat Jul 11, 2020 2:14 pm
I'll later post these firmware images with FP32 and FP64 and announce here.
Finally, 14 firmware variants with 'ulab' module for 5 Pyboard's, see the topics :
- Pyboard v1.1/Lite v1.0 firmwares with single/double precision, threads, network and ulab module;
- Pyboard D firmwares with single/double precision, threads and ulab module.
Other 10 firmware variants will be available when double precision + ulab becomes compatible on Pyboard v1.1/Lite v1.0/D SF2/D SF3.

@v923z, feel free to use the firmware files in any place.
My "MicroPython Samples". My "MicroPython Firmwares" with many options (double precision, ulab, etc).

User avatar
pythoncoder
Posts: 4374
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: ulab, or what you will - numpy on bare metal

Post by pythoncoder » Thu Jul 16, 2020 5:39 am

rcolistete wrote:
Wed Jul 15, 2020 11:51 pm
...
I'll do some benchmarking of Pyboard D SF2 in the following week, maybe some timing measures can show how much a module in QSPI flash is slower (importing and running).
That will be very interesting!
Peter Hinch

v923z
Posts: 149
Joined: Mon Dec 28, 2015 6:19 pm

Re: ulab, or what you will - numpy on bare metal

Post by v923z » Thu Jul 16, 2020 6:37 am

rcolistete wrote:
Wed Jul 15, 2020 11:58 pm
The issue happens with Pyboard D SF2/SF3, with 512 kB internal flash memory, of which 480 kB as FLASH_APP, so only 41 kB is left with MicroPython v1.12, without 'ulab'. As Pyboard D SF2/SF3 has 2MB QSPI flash with 2048 kB as FLASH_EXT, with approx. 1.4 MB free, it is better to know how to move modules from internal to QSPI flash memory.
@rcolistete As a possible workaround, I could revive the 1D implementation of ulab. That significantly reduces code-size, since you never have to check, whether you have a matrix or a straight array, and you never have to implement nested loops. I floated this idea a couple of months ago, but there wasn't much interest in it.

User avatar
rcolistete
Posts: 265
Joined: Thu Dec 31, 2015 3:12 pm
Location: Brazil
Contact:

Re: ulab, or what you will - numpy on bare metal

Post by rcolistete » Thu Jul 16, 2020 6:50 am

v923z wrote:
Thu Jul 16, 2020 6:37 am
rcolistete wrote:
Wed Jul 15, 2020 11:58 pm
The issue happens with Pyboard D SF2/SF3, with 512 kB internal flash memory, of which 480 kB as FLASH_APP, so only 41 kB is left with MicroPython v1.12, without 'ulab'. As Pyboard D SF2/SF3 has 2MB QSPI flash with 2048 kB as FLASH_EXT, with approx. 1.4 MB free, it is better to know how to move modules from internal to QSPI flash memory.
@rcolistete As a possible workaround, I could revive the 1D implementation of ulab. That significantly reduces code-size, since you never have to check, whether you have a matrix or a straight array, and you never have to implement nested loops. I floated this idea a couple of months ago, but there wasn't much interest in it.
Please, do not remove features from ulab. I want the opposite with ulab + use some .py as frozen modules.

It is very simple to move modules from Pyboard D SF2/SF3 internal flash to QSPI flash, once it is known : just 1 line of code.
My "MicroPython Samples". My "MicroPython Firmwares" with many options (double precision, ulab, etc).

v923z
Posts: 149
Joined: Mon Dec 28, 2015 6:19 pm

Re: ulab, or what you will - numpy on bare metal

Post by v923z » Thu Jul 16, 2020 6:00 pm

rcolistete wrote:
Thu Jul 16, 2020 6:50 am
v923z wrote:
Thu Jul 16, 2020 6:37 am
@rcolistete As a possible workaround, I could revive the 1D implementation of ulab. That significantly reduces code-size, since you never have to check, whether you have a matrix or a straight array, and you never have to implement nested loops. I floated this idea a couple of months ago, but there wasn't much interest in it.
Please, do not remove features from ulab. I want the opposite with ulab + use some .py as frozen modules.
I meant it as an alternative, not as a replacement.

User avatar
rcolistete
Posts: 265
Joined: Thu Dec 31, 2015 3:12 pm
Location: Brazil
Contact:

Re: ulab, or what you will - numpy on bare metal

Post by rcolistete » Thu Jul 23, 2020 7:32 am

Another option is to move the 'ulab' module to FLASH_EXT.
Just add to line 51 of '../micropython/ports/stm32/boards/PYBD_SF2/f722_qspi.ld' :

Code: Select all

        *code/*(.text* .rodata*)
So these 4 options allow to install full ulab :
- .py frozen modules;
- 'lwip' native module;
- py frozen modules + 'lwip' native module;
- ulab module;
moved to FLASH_EXT (external 2 MB QSPI flash).

Moving to FLASH_EXT :
- only '.py frozen modules' is not enough to build firmware with ulab + DP (double precision) + threads for Pyboard D SF2/SF3;
- ulab module is always enough to build Pyboard D SF2/SF3 firmware with ulab and any combination of SP/DP and threads.

I support that MicroPython upstream should have '../micropython/ports/stm32/boards/PYBD_SF2/f722_qspi.ld' with the following code added to line 51 :

Code: Select all

        *code/*(.text* .rodata*)       /* move ulab module to external QSPI */
/*        *frozen_content.o(.text* .rodata*)   */    /* move .py frozen modules to external QSPI */
/*        *lib/lwip/*(.text* .rodata*)         */    /* move lwip native module to external QSPI */
so the users compiling MicroPython code with ulab, many frozen modules, etc, would have some options to choose.
My "MicroPython Samples". My "MicroPython Firmwares" with many options (double precision, ulab, etc).

User avatar
rcolistete
Posts: 265
Joined: Thu Dec 31, 2015 3:12 pm
Location: Brazil
Contact:

Re: ulab, or what you will - numpy on bare metal

Post by rcolistete » Fri Jul 24, 2020 3:13 am

See the topic "ulab for ESP8266", where there is a MicroPython firmware with ulab for ESP8266, with some discussions.
My "MicroPython Samples". My "MicroPython Firmwares" with many options (double precision, ulab, etc).

User avatar
rcolistete
Posts: 265
Joined: Thu Dec 31, 2015 3:12 pm
Location: Brazil
Contact:

Re: ulab, or what you will - numpy on bare metal

Post by rcolistete » Mon Jul 27, 2020 1:46 pm

One extra option is to move 'ulab' sub-modules to FLASH_EXT, instead of full ulab to FLASH_EXT (external 2 MB QSPI flash).
Just add to line 51 of '../micropython/ports/stm32/boards/PYBD_SF2/f722_qspi.ld' (which is also used by Pyboard D SF3), here moving the sub-modules 'compare' and 'user' :

Code: Select all

        *code/compare/*(.text* .rodata*)  
        *code/user/*(.text* .rodata*)
The advantages :
- granularity, we can choose which parts of 'ulab' stay in the faster 480 kB of FLASH_APP (inside the 512 kB internal flash memory), or which ones are moved to the slower FLASH_EXT (external 2 MB QSPI flash);
- so parts of 'ulab' which are more time-critical (high performance is needed) from the user's point of view can stay in FLASH_APP.

Moving to FLASH_EXT :
- only '.py frozen modules' is enough to build firmware with ulab + SP (single precision) + optionally threads for Pyboard D SF2/SF3;
- only '.py frozen modules' is not enough to build firmware with ulab + DP (double precision) + optionally threads for Pyboard D SF2/SF3, so in this case it is worth to select the minimum quantity of sub-modules, not time-critical, to move to FLASH_EXT, to avoid overflow in FLASH_APP.

To understand the speed difference between internal and external flash memories, see these preliminary benchmarks for 'ulab.fft.fft()' of 1024 points, Pyboard D SF2 + MicroPython v1.12 with ulab, default clock (120 MHz), without threads :
- full ulab in 512 kB internal flash memory, FP32 : 1.873 ms;
- full ulab in external 2 MB QSPI flash, FP32 : 2.070 ms; (10.5% more)
- full ulab in 512 kB internal flash memory, FP64 : 31.856 ms;
- full ulab in external 2 MB QSPI flash, FP64 : 60.528 ms. (90.0% more)
So :
- for single precision (SP/FP32), the overhead is small (10.5%) as the MCU do many FP32 calculations in hardware;
- for double precision (DP/FP64), the overhead is huge (90.0%) as the MCU do all FP64 calculations in software, reading more code from the slower external flash memory.

About using Pyboard D SF2/SF3 flash memories, the logic, seems to be :
- use the maximum capacity (480 kB) of FLASH_APP inside the 512 kB internal flash memory;
- move to the slower FLASH_EXT (external 2 MB QSPI flash) only parts (modules and sub-modules) which don't fit in FLASH_APP and aren't time-critical.
My "MicroPython Samples". My "MicroPython Firmwares" with many options (double precision, ulab, etc).

User avatar
rcolistete
Posts: 265
Joined: Thu Dec 31, 2015 3:12 pm
Location: Brazil
Contact:

Re: ulab, or what you will - numpy on bare metal

Post by rcolistete » Mon Jul 27, 2020 1:54 pm

My "MicroPython Samples". My "MicroPython Firmwares" with many options (double precision, ulab, etc).

v923z
Posts: 149
Joined: Mon Dec 28, 2015 6:19 pm

Re: ulab, or what you will - numpy on bare metal

Post by v923z » Mon Jul 27, 2020 5:44 pm

rcolistete wrote:
Mon Jul 27, 2020 1:54 pm
25 new MicroPython firmwares with ulab v0.54 (released in July 24 2020) were published :
- 11 with combinations of sp, dp, sp & _thread, dp & _thread, sp & network, dp & network for Pyboard v1.1/Lite v1.0;
- 12 with combinations of sp, dp, sp & _thread, dp & _thread for Pyboard D SF2/SF3/SF6;
- 2 for ESP8266 >= 2 MB and 1 MB of flash (with reduced filesystem), using only sp, without threads, due to limitations of MicroPython on ESP8266.

So a ESP8266 ESP-01S board, which costs about US$1.05, can run ulab. 8-)
Roberto, thanks for the hard work, it is really exciting! I have added a link to your repository in the readme: https://github.com/v923z/micropython-ul ... /README.md

Post Reply