I agree that the bottleneck seems to be in allocating float objects rather than the floating point computations themselves, and I devised some more tests to try to demonstrate that.
(I thought about creating a new topic for 'Floating Point issues' but there were already responses here by the time I got back to this.)
Just so the point doesn't get lost at the end of a long and detailed post- I'm curious if anyone else is concerned about being able to write efficient FP routines in uPy and thinks this is worthy of raising an issue on GitHub.
The following functions can demonstrate what @dhylands and I found:
Code: Select all
import pyb
def runtime(func, len):
a = pyb.millis() # Starting time
b = func(len)
print('Test ran for', (pyb.millis() - a)/1000, 'seconds')
return b
def test_f1(len):
s = 0.0
x = 1.0
for i in range(len):
s = s + x
return s
def test_f2(len):
s = 0.0
for i in range(len):
s = s + 1.0
return s
def test_f3(len):
s = 0.0
x = 1.0
for i in range(len):
s = s + (x * x)
return s
With the following results:
Code: Select all
>>> runtime(test_f1, 100000)
Test ran for 11.668 seconds
100000.0
>>> runtime(test_f2, 100000)
Test ran for 22.411 seconds
100000.0
>>> runtime(test_f3, 100000)
Test ran for 22.435 seconds
100000.0
The tests show that executing "s = s + 1.0" takes the same time as executing "s = s + (x * x)", probably because a float object is created each time we execute "1.0" and when we execute "x * x". Executing "s = s + (1.0 * x)" takes 32.894 seconds, 3 times as long as "s = s + x" because there are two intermediate float objects made, etc.
To really push the floating point processing on the chip we need a function that does its FP calculations at the C level rather than the Python level:
Code: Select all
import math
from math import cos
def test_cos1(len):
s = 0.0
for i in range(len):
s = s + cos(i)
return s
def test_cos2(len):
s = 0.0
x = math.pi/3
for i in range(len):
s = s + cos(i*x)
return s
Results:
Code: Select all
>>> runtime(test_cos1,100000)
Test ran for 24.941 seconds
1.03239
>>> runtime(test_cos2,100000)
Test ran for 36.563 seconds
0.0476162
The way I see it, "test_cos1" (add+cos) is analogous to "test_f1" (add) with the FP-intensive cosine operation contributing about 13 seconds to the execution time, and "test_cos2" (add+mult+cos) is analogous to "test_f3" (add+mult), with the cosine again adding about 13 seconds of execution time. (Side note: "test_cos2" also reveals rounding errors in the single-precision cosine computation; each iteration of "cos(i*x)" should yield 1.0, 0.5, -0.5, or -1.0; but larger values of i lead to noticeable rounding errors.)
Note-edited so that code examples can be cut-and-paste directly into a fresh REPL