Just to confirm, I verified the 0.8ms execution time on a PYBD767 running at default clock speed of 216 MHz. What was the hardware that gave 2ms execution time?v923z wrote:
I think there is a misunderstanding stemming from this post:
In the original post, I quoted a measurement of 1.948 ms, and claimed that the FFT could be gotten in less than 2 ms, and not 0.8 ms. So, it is only a factor of two in speed, and basically no overhead in RAM, because with the exception of a handful of temporary variables, the transform happens in place. It is also true that I did not overclock the CPU, so if one is to be fair, then the gain is a bit more: if I extrapolate your numbers, if the CPU is clocked at 168 MHz, then the FFT in assembly would cost around 4.3 ms.chuckbook wrote: ↑
Fri Sep 27, 2019 1:13 pm
Very impressive! Thanks for sharing this. 1k FFT (SP) in ~0.8ms on PYBD, not bad.
ulab, or what you will - numpy on bare metal
Re: ulab, or what you will - numpy on bare metal
Re: ulab, or what you will - numpy on bare metal
Thanks for the report! I measured on a pyboard v.1.1, the gold standard. The pybd767 has a different processor. That would explain the difference.chuckbook wrote: ↑Thu Oct 03, 2019 1:35 pmJust to confirm, I verified the 0.8ms execution time on a PYBD767 running at default clock speed of 216 MHz. What was the hardware that gave 2ms execution time?v923z wrote:
I think there is a misunderstanding stemming from this post:
In the original post, I quoted a measurement of 1.948 ms, and claimed that the FFT could be gotten in less than 2 ms, and not 0.8 ms. So, it is only a factor of two in speed, and basically no overhead in RAM, because with the exception of a handful of temporary variables, the transform happens in place. It is also true that I did not overclock the CPU, so if one is to be fair, then the gain is a bit more: if I extrapolate your numbers, if the CPU is clocked at 168 MHz, then the FFT in assembly would cost around 4.3 ms.chuckbook wrote: ↑
Fri Sep 27, 2019 1:13 pm
Very impressive! Thanks for sharing this. 1k FFT (SP) in ~0.8ms on PYBD, not bad.
Re: ulab, or what you will - numpy on bare metal
@v923z: Thanks for the info.
BTW, using our build settings (gcc version 8.2.0, -O2) the test gave 1.8ms on PYBV11.
O2 results in bigger code size but it makes sense to use it if there is some spare flash available.
BTW, using our build settings (gcc version 8.2.0, -O2) the test gave 1.8ms on PYBV11.
O2 results in bigger code size but it makes sense to use it if there is some spare flash available.
Re: ulab, or what you will - numpy on bare metal
Good to know. I have used the standard settings, beyond passing the USER_C_MODULES parameter to make, I haven't modified anything in the makefile. My gcc version is 7.4.0. I don't know, whether that changes anything.
As for the code size, did you actually measure that, or do you rely on the claim of gcc? In the past, I used to compile a lot for atmega, and the statement was the same, -O2 should make faster, but slightly bigger firmware. My experience was that the firmware almost always got slightly smaller, and I also did gain in speed. Hence my question.
Re: ulab, or what you will - numpy on bare metal
Here are the code sizes of -O2 and -Os build options.
Code: Select all
text data bss dec hex filename
463704 40 28052 491796 78114 build-PYBV11_O2/firmware.elf
424484 40 28052 452576 6e7e0 build-PYBV11/firmware.elf
Re: ulab, or what you will - numpy on bare metal
Chuck,
Without ulab, the size is about 16 kB smaller. I can't believe that the different compiler (gcc 8.2 vs. 7.4) would explain such a significant difference.
You are probably compiling other modules into the firmware, because my size with the -O2 switch ischuckbook wrote: ↑Thu Oct 03, 2019 5:03 pmHere are the code sizes of -O2 and -Os build options.Code: Select all
text data bss dec hex filename 463704 40 28052 491796 78114 build-PYBV11_O2/firmware.elf 424484 40 28052 452576 6e7e0 build-PYBV11/firmware.elf
Code: Select all
text data bss dec hex filename
347832 40 27888 375760 5bbd0 firmware.elf
Re: ulab, or what you will - numpy on bare metal
Don't get confused about the absolute size of the code. There are a lot of additional features included. I just wanted to demonstrate code size increase for -Os and -O2.
commutative operations
Hi all,
I have tried to clean up the code for binary operations, and run into a fundamental problem with commutative operators. Namely, this can be handled
because the evaluation of the product operator begins with a, which is an ndarray, therefore, I handle the code for that. However, if I try to turn the operands around like
then I end up with a fatal error:
and the reason for that is that evaluation now begins with 5, and an int should not be multiplied by an object that is not a scalar. A similar problem should exist for lists, but that is solved in the mp_obj_int_binary_op_extra_cases function of objint.c, (https://github.com/micropython/micropyt ... int.c#L370) where the operands are simply swapped, and with that the evaluation order is correct.
Now, runtime.c contains the generic_binary_op flag, (https://github.com/micropython/micropyt ... ime.c#L563) where one could, in principle, hook into mp_binary_op, but the flag is defined for a couple of specific cases, and is not generic in this sense.
My question is, should not there be a case at the end of the switch of mp_binary_op that would simply point to generic_binary_op in the instance, when everything before that failed? Or are there other mechanisms of overriding the standard binary operator from the user module itself (i.e., without having to modify the micropython code base)?
In numpy, an ndarray can be multiplied by a scalar or another ndarray, irrespective of the order of the operands. I think, it would be great, if we could support that.
Thanks,
Zoltán
I have tried to clean up the code for binary operations, and run into a fundamental problem with commutative operators. Namely, this can be handled
Code: Select all
import ulab
a = ulab.ndarray([1, 2, 3])
a*5
Code: Select all
import ulab
a = ulab.ndarray([1, 2, 3])
5*a
Code: Select all
Traceback (most recent call last):
File "/dev/shm/micropython.py", line 7, in <module>
TypeError: unsupported types for __mul__: 'int', 'ndarray'
Now, runtime.c contains the generic_binary_op flag, (https://github.com/micropython/micropyt ... ime.c#L563) where one could, in principle, hook into mp_binary_op, but the flag is defined for a couple of specific cases, and is not generic in this sense.
My question is, should not there be a case at the end of the switch of mp_binary_op that would simply point to generic_binary_op in the instance, when everything before that failed? Or are there other mechanisms of overriding the standard binary operator from the user module itself (i.e., without having to modify the micropython code base)?
In numpy, an ndarray can be multiplied by a scalar or another ndarray, irrespective of the order of the operands. I think, it would be great, if we could support that.
Thanks,
Zoltán
- pythoncoder
- Posts: 5956
- Joined: Fri Jul 18, 2014 8:01 am
- Location: UK
- Contact:
Re: ulab, or what you will - numpy on bare metal
Would it be possible to trap the exception; if it occurs use the integer to instantiate an object for which you define __mul__ etc?
Peter Hinch
Index to my micropython libraries.
Index to my micropython libraries.