Page 9 of 11

Re: ulab, or what you will - numpy on bare metal

Posted: Thu Jul 16, 2020 12:19 am
by rcolistete
rcolistete wrote:
Sat Jul 11, 2020 2:14 pm
I'll later post these firmware images with FP32 and FP64 and announce here.
Finally, 14 firmware variants with 'ulab' module for 5 Pyboard's, see the topics :
- Pyboard v1.1/Lite v1.0 firmwares with single/double precision, threads, network and ulab module;
- Pyboard D firmwares with single/double precision, threads and ulab module.
Other 10 firmware variants will be available when double precision + ulab becomes compatible on Pyboard v1.1/Lite v1.0/D SF2/D SF3.

@v923z, feel free to use the firmware files in any place.

Re: ulab, or what you will - numpy on bare metal

Posted: Thu Jul 16, 2020 5:39 am
by pythoncoder
rcolistete wrote:
Wed Jul 15, 2020 11:51 pm
...
I'll do some benchmarking of Pyboard D SF2 in the following week, maybe some timing measures can show how much a module in QSPI flash is slower (importing and running).
That will be very interesting!

Re: ulab, or what you will - numpy on bare metal

Posted: Thu Jul 16, 2020 6:37 am
by v923z
rcolistete wrote:
Wed Jul 15, 2020 11:58 pm
The issue happens with Pyboard D SF2/SF3, with 512 kB internal flash memory, of which 480 kB as FLASH_APP, so only 41 kB is left with MicroPython v1.12, without 'ulab'. As Pyboard D SF2/SF3 has 2MB QSPI flash with 2048 kB as FLASH_EXT, with approx. 1.4 MB free, it is better to know how to move modules from internal to QSPI flash memory.
@rcolistete As a possible workaround, I could revive the 1D implementation of ulab. That significantly reduces code-size, since you never have to check, whether you have a matrix or a straight array, and you never have to implement nested loops. I floated this idea a couple of months ago, but there wasn't much interest in it.

Re: ulab, or what you will - numpy on bare metal

Posted: Thu Jul 16, 2020 6:50 am
by rcolistete
v923z wrote:
Thu Jul 16, 2020 6:37 am
rcolistete wrote:
Wed Jul 15, 2020 11:58 pm
The issue happens with Pyboard D SF2/SF3, with 512 kB internal flash memory, of which 480 kB as FLASH_APP, so only 41 kB is left with MicroPython v1.12, without 'ulab'. As Pyboard D SF2/SF3 has 2MB QSPI flash with 2048 kB as FLASH_EXT, with approx. 1.4 MB free, it is better to know how to move modules from internal to QSPI flash memory.
@rcolistete As a possible workaround, I could revive the 1D implementation of ulab. That significantly reduces code-size, since you never have to check, whether you have a matrix or a straight array, and you never have to implement nested loops. I floated this idea a couple of months ago, but there wasn't much interest in it.
Please, do not remove features from ulab. I want the opposite with ulab + use some .py as frozen modules.

It is very simple to move modules from Pyboard D SF2/SF3 internal flash to QSPI flash, once it is known : just 1 line of code.

Re: ulab, or what you will - numpy on bare metal

Posted: Thu Jul 16, 2020 6:00 pm
by v923z
rcolistete wrote:
Thu Jul 16, 2020 6:50 am
v923z wrote:
Thu Jul 16, 2020 6:37 am
@rcolistete As a possible workaround, I could revive the 1D implementation of ulab. That significantly reduces code-size, since you never have to check, whether you have a matrix or a straight array, and you never have to implement nested loops. I floated this idea a couple of months ago, but there wasn't much interest in it.
Please, do not remove features from ulab. I want the opposite with ulab + use some .py as frozen modules.
I meant it as an alternative, not as a replacement.

Re: ulab, or what you will - numpy on bare metal

Posted: Thu Jul 23, 2020 7:32 am
by rcolistete
Another option is to move the 'ulab' module to FLASH_EXT.
Just add to line 51 of '../micropython/ports/stm32/boards/PYBD_SF2/f722_qspi.ld' :

Code: Select all

        *code/*(.text* .rodata*)
So these 4 options allow to install full ulab :
- .py frozen modules;
- 'lwip' native module;
- py frozen modules + 'lwip' native module;
- ulab module;
moved to FLASH_EXT (external 2 MB QSPI flash).

Moving to FLASH_EXT :
- only '.py frozen modules' is not enough to build firmware with ulab + DP (double precision) + threads for Pyboard D SF2/SF3;
- ulab module is always enough to build Pyboard D SF2/SF3 firmware with ulab and any combination of SP/DP and threads.

I support that MicroPython upstream should have '../micropython/ports/stm32/boards/PYBD_SF2/f722_qspi.ld' with the following code added to line 51 :

Code: Select all

        *code/*(.text* .rodata*)       /* move ulab module to external QSPI */
/*        *frozen_content.o(.text* .rodata*)   */    /* move .py frozen modules to external QSPI */
/*        *lib/lwip/*(.text* .rodata*)         */    /* move lwip native module to external QSPI */
so the users compiling MicroPython code with ulab, many frozen modules, etc, would have some options to choose.

Re: ulab, or what you will - numpy on bare metal

Posted: Fri Jul 24, 2020 3:13 am
by rcolistete
See the topic "ulab for ESP8266", where there is a MicroPython firmware with ulab for ESP8266, with some discussions.

Re: ulab, or what you will - numpy on bare metal

Posted: Mon Jul 27, 2020 1:46 pm
by rcolistete
One extra option is to move 'ulab' sub-modules to FLASH_EXT, instead of full ulab to FLASH_EXT (external 2 MB QSPI flash).
Just add to line 51 of '../micropython/ports/stm32/boards/PYBD_SF2/f722_qspi.ld' (which is also used by Pyboard D SF3), here moving the sub-modules 'compare' and 'user' :

Code: Select all

        *code/compare/*(.text* .rodata*)  
        *code/user/*(.text* .rodata*)
The advantages :
- granularity, we can choose which parts of 'ulab' stay in the faster 480 kB of FLASH_APP (inside the 512 kB internal flash memory), or which ones are moved to the slower FLASH_EXT (external 2 MB QSPI flash);
- so parts of 'ulab' which are more time-critical (high performance is needed) from the user's point of view can stay in FLASH_APP.

Moving to FLASH_EXT :
- only '.py frozen modules' is enough to build firmware with ulab + SP (single precision) + optionally threads for Pyboard D SF2/SF3;
- only '.py frozen modules' is not enough to build firmware with ulab + DP (double precision) + optionally threads for Pyboard D SF2/SF3, so in this case it is worth to select the minimum quantity of sub-modules, not time-critical, to move to FLASH_EXT, to avoid overflow in FLASH_APP.

To understand the speed difference between internal and external flash memories, see these preliminary benchmarks for 'ulab.fft.fft()' of 1024 points, Pyboard D SF2 + MicroPython v1.12 with ulab, default clock (120 MHz), without threads :
- full ulab in 512 kB internal flash memory, FP32 : 1.873 ms;
- full ulab in external 2 MB QSPI flash, FP32 : 2.070 ms; (10.5% more)
- full ulab in 512 kB internal flash memory, FP64 : 31.856 ms;
- full ulab in external 2 MB QSPI flash, FP64 : 60.528 ms. (90.0% more)
So :
- for single precision (SP/FP32), the overhead is small (10.5%) as the MCU do many FP32 calculations in hardware;
- for double precision (DP/FP64), the overhead is huge (90.0%) as the MCU do all FP64 calculations in software, reading more code from the slower external flash memory.

About using Pyboard D SF2/SF3 flash memories, the logic, seems to be :
- use the maximum capacity (480 kB) of FLASH_APP inside the 512 kB internal flash memory;
- move to the slower FLASH_EXT (external 2 MB QSPI flash) only parts (modules and sub-modules) which don't fit in FLASH_APP and aren't time-critical.

Re: ulab, or what you will - numpy on bare metal

Posted: Mon Jul 27, 2020 1:54 pm
by rcolistete

Re: ulab, or what you will - numpy on bare metal

Posted: Mon Jul 27, 2020 5:44 pm
by v923z
rcolistete wrote:
Mon Jul 27, 2020 1:54 pm
25 new MicroPython firmwares with ulab v0.54 (released in July 24 2020) were published :
- 11 with combinations of sp, dp, sp & _thread, dp & _thread, sp & network, dp & network for Pyboard v1.1/Lite v1.0;
- 12 with combinations of sp, dp, sp & _thread, dp & _thread for Pyboard D SF2/SF3/SF6;
- 2 for ESP8266 >= 2 MB and 1 MB of flash (with reduced filesystem), using only sp, without threads, due to limitations of MicroPython on ESP8266.

So a ESP8266 ESP-01S board, which costs about US$1.05, can run ulab. 8-)
Roberto, thanks for the hard work, it is really exciting! I have added a link to your repository in the readme: https://github.com/v923z/micropython-ul ... /README.md