I've made a port to the OWL platform, which is a bit different from other microcontroller targets in that it runs in user space, so there's an additional abstraction layer between hardware and interpreter. The purpose is to use it in the context of live coding and interactive realtime audio, typically as part of a Eurorack synth setup.
The processors used by OWL devices are STM32F4 and STM32H7 ARM Cortex M with 8Mbyte additional external RAM and Flash, and stereo codec.
Details of the draft version are here:
https://github.com/pingdynasty/micropyt ... l-port/owl
I'll try to detail some of the issues that have come up below.
First up: I created an "owl" module for hardware specific control. I borrowed what I could from the stm32 port to keep some consistency. It would be great to get some feedback on this. Specifically the overloading of e.g. owl.parameter() for both getting and setting parameter values - would you consider it better (though wordier) to use owl.set_parameter() and owl.get_parameter()?
Second: As the purpose is realtime audio, it is important that the user can produce samples and control signals at sample and block rate, respectively. To this end I made the setter functions accept iterators as well as fixed values. Then the iterators are invoked at a rate that is internally clocked. This works fine for outputs, but for inputs I'm not sure how exactly to do it. I assume passing a callback function is the way to go. Are there any examples of this in existing ports that I could look at?
Third and last:
Performance when running at audio rate is frequently terrible, and I'd like to better understand why that is. An example:
Code: Select all
freq = 440
def saw():
ph = 0
while True:
yield ph
ph += (2 * freq) / 48000
if ph >= 1:
ph -= 2
owl.output(0, saw())
In comparison, iterating over an array uses only 6% CPU (including fixed block processing overheads):
Code: Select all
def cycle(p):
try:
len(p)
except TypeError:
cache = []
for i in p:
yield i
cache.append(i)
p = cache
while p:
yield from p
freq = 440
len = int(48000/freq)
a = [math.sin(2 * math.pi * x / len) for x in range(0, len)]
owl.output(0, cycle(a))
In the module implementation, I call `it = mp_getiter(obj)` once, and then invoke `mp_iternext(it)` for each output sample.
The benchmarking here is not very scientifc and the percentages should be taken with a grain of salt. But it is still a huge performance difference. Is it likely to be due to how the python code is compiled and interpreted, or how I use the C bindings? What can I do to get the first case performance to match the second one?