Pulse counter performance limit

All ESP8266 boards running MicroPython.
Official boards are the Adafruit Huzzah and Feather boards.
Target audience: MicroPython users with an ESP8266 board.
Post Reply
pipp
Posts: 4
Joined: Thu Jan 28, 2021 8:29 am

Pulse counter performance limit

Post by pipp » Sun Jan 30, 2022 12:28 pm

Hi, fellow pythonistas,

I'm prototyping a mains frequency counter and I decided to use DS3231 RTC for timekeeping. It has a 32768 Hz square wave output, which means I just have to count the pulses to keep time. Sounds simple, but I think I'm hitting the limits of uPy here, which suprised me. I tried the following demo code:

Code: Select all

import time
from machine import Pin, freq
freq(160000000)

clock_pin = Pin(4, Pin.IN, pull=Pin.PULL_UP)
clock_ticks = 0

def clock_cb(p):
    global clock_ticks
    clock_ticks = clock_ticks + 1
clock_pin.irq(trigger=Pin.IRQ_FALLING, handler=clock_cb)

while True:
    clock_ticks = 0
    time.sleep(1)
    print(clock_ticks)
I know time.sleep() is not good for timekeeping and probably I shouldn't expect spot on results, however, I consistently get results about 23900 pulses in a second. I replicated the same code in Arduino IDE and got incredible results, most of the counts were precisely 32768, so this is not a hardware issue.

I tried decorating it with @micropython.native, which did speed it up to about 27200 pulses detected. Is there a way of further optimizing the interrupt callback function?

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Pulse counter performance limit

Post by Roberthh » Sun Jan 30, 2022 12:49 pm

You could try the viper decorator instead of the native. But that may complain about type mixes.
Yes, a python callback will never be precise. But the internal time of the ESP8266 is crystal controlled and usually pretty precise. Not as precise as a good clock crystal, but at least in the range of ~20 * 10**-6.
You can also program the DS3231 to generate lower frequency pulses at the SQW output. You can select between 1, 1024, 4096 and 8192 Hz.

pipp
Posts: 4
Joined: Thu Jan 28, 2021 8:29 am

Re: Pulse counter performance limit

Post by pipp » Mon Jan 31, 2022 2:50 pm

Thank you, Robert,
Roberthh wrote:
Sun Jan 30, 2022 12:49 pm
You could try the viper decorator instead of the native. But that may complain about type mixes.
That is precisely what happens!

Code: Select all

ViperTypeError: can't do binary op between 'object' and 'int'
(in incrementing clock_ticks by one)
Roberthh wrote:
Sun Jan 30, 2022 12:49 pm
But the internal time of the ESP8266 is crystal controlled and usually pretty precise.
I read that RTC on ESP8266 was unreliable, and I thought this means that all timing related stuff was bad. But you must be right, since I got such nice results with Arduino tests. I will not overcomplicate things and simply use the time module.

Thanks a lot!
Best,
Peter

rkompass
Posts: 66
Joined: Fri Sep 17, 2021 8:25 pm

Re: Pulse counter performance limit

Post by rkompass » Mon Jan 31, 2022 6:11 pm

Hello Peter,
you could try the following viper counting function:

Code: Select all

# have a fast counter using a global variable with micropython.viper

gv = b'\x00\x00\x00\x00'

@micropython.viper
def count32():
    global gv
    p = ptr32(gv)
    p[0] += 1


print('Before: ', int.from_bytes(gv, 'little'))

count32()
count32()
count32()

print('After: ', int.from_bytes(gv, 'little'))
Also have a look at https://github.com/micropython/micropython/issues/8086.

But even if this function should be much faster and therefore much more suitable for an interrupt, I seem to remember that on esp8266 you only can have soft interrupts. Soft interrupts are queued and there always may be some more important interrupts like those for Wifi or Bluetooth having priority. So in essence you cannot use the ESPs for very precise or critical timing tasks.
This was discussed in the forum several times.

Anyway, it would be interesting what results you would achieve with above function, perhaps with deactivated wifi, bluetooth.

Regards,
Raul

pipp
Posts: 4
Joined: Thu Jan 28, 2021 8:29 am

Re: Pulse counter performance limit

Post by pipp » Mon Jan 31, 2022 7:51 pm

Awesome, in a perverse way I kind of hoped to get my hands dirty with pointer magic :D I'm not sure what I'm doing, but I love it :D

Here are the results: I modified the callback to

Code: Select all

def clock_cb(p):
    global count32
    count32()
Without further modifications the ESP8266 resets in the while loop, so I also implemented a reset function:

Code: Select all

@micropython.viper
def reset32():
    global clock_ticks
    p = ptr32(clock_ticks)
    p[0] = 0
which gets called in the while loop to reset the counter. Unfortunately the output is the familiar 23700 counts. Interestingly (except for those who understand issue 8086), the following code does not reset the counter:

Code: Select all

while True:
    clock_ticks = b'\x00\x00\x00\x00'
    time.sleep(1)
    print(int.from_bytes(clock_ticks, "little")) 
WiFi was disabled with this snipped I found on the forum:

Code: Select all

import network
ap = network.WLAN(network.AP_IF)
ap.active(False)
sta_if = network.WLAN(network.STA_IF)
sta_if.active(False)  
and with

Code: Select all

from network import WLAN
WLAN(0).active(0)
WLAN(1).active(0)
The speedup is marginal, instead of 23700 I get 25200 counts in both cases. I also tried disabling garbage collection with gc.disable(), but there was no significant difference.

If someone has further suggestions I'd be happy to benchmark them, but more out of spite than out of necessity. Thank you both!

rkompass
Posts: 66
Joined: Fri Sep 17, 2021 8:25 pm

Re: Pulse counter performance limit

Post by rkompass » Mon Jan 31, 2022 9:12 pm

I interpret your last observation as "1500 times per second WIFI activity gets in the way of processing the soft interrupt".

Perhaps you can also disable Bluetooth?

I thought you could use count32() in the callback directly, without the new definition of clock_cb().
But it seems that does not matter. If the interrupt callback function is fast enough, only the possibility to process the interrupt before the next triggering flank of the clock is important. IIRC even without Bluetooth there are other sources of blocking the "processing time flow" ;)

pipp
Posts: 4
Joined: Thu Jan 28, 2021 8:29 am

Re: Pulse counter performance limit

Post by pipp » Mon Jan 31, 2022 10:05 pm

Damn, I just put my breadboard away :D

In retrospect I hadn't been diligent with reseting the board and I tried so many things, I'd have to repeat those measurements to bear any weight...
rkompass wrote:
Mon Jan 31, 2022 9:12 pm
Perhaps you can also disable Bluetooth?
ESP8266 doesn't have Bluetooth.
rkompass wrote:
Mon Jan 31, 2022 9:12 pm
I thought you could use count32() in the callback directly, without the new definition of clock_cb().
Now this was positively revolutionary! With this I get to 30286 pulses in a second. Stupid of me to not see this immediately...
Final code looks like this:

Code: Select all

import time
import micropython
from machine import Pin, freq
freq(160000000)


# Option 1
from network import WLAN

WLAN(0).active(0)
WLAN(1).active(0)

# Option 2
import network
ap = network.WLAN(network.AP_IF)
ap.active(False)
sta_if = network.WLAN(network.STA_IF)
sta_if.active(False)  


clock_pin = Pin(4, Pin.IN, pull=Pin.PULL_UP) # D2 on NodeMCU
clock_ticks = b'\x00\x00\x00\x00'

@micropython.viper
def count32(pin):
    global clock_ticks
    p = ptr32(clock_ticks)
    p[0] += 1

@micropython.viper
def reset32():
    global clock_ticks
    p = ptr32(clock_ticks)
    p[0] = 0

clock_pin.irq(trigger=Pin.IRQ_FALLING, handler=count32)
while True:
    reset32()
    time.sleep(1)
    print(int.from_bytes(clock_ticks, "little"))
As far as WiFi is concerned, without any connections established beforehand I get the 30286 pulses just fine. If radios are disabled with option 1, I count the same amount of pulses, but if they are disabled with option 2, performance drops to 28500 pulses or so. This time more attention was paid to resetting the board between tests, but I can't rationalize these results. Disabling garbage collection had no effect on performance. I tried increasing the sleep time to 10 seconds, just to check if printing and bytes-to-integer conversion takes too long, but I got nice numbers (302860).

rkompass
Posts: 66
Joined: Fri Sep 17, 2021 8:25 pm

Re: Pulse counter performance limit

Post by rkompass » Sat Feb 05, 2022 11:39 pm

I took some measurements with my ESP8266 NodeMCU.

With the code:

Code: Select all

# time fast counters

import utime
import machine

machine.freq(160000000)

gv = b'\x00\x00\x00\x00'
gb : int = 0


@micropython.viper
def count32a():
    global gv
    ptr32(gv)[0] += 1

@micropython.viper
def count32b():
    global gb
    q : int = int(gb)
    q += 1
    gb = q # << 1 | 1  # <- workaround!

@micropython.viper
def i1000xCount32a():
    c = count32a
    for _ in range(20):
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()

@micropython.viper
def i1000xCount32b():
    c = count32b
    for _ in range(20):
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()
        c(); c(); c(); c(); c(); c(); c(); c(); c(); c()


print('Before: ', int.from_bytes(gv, 'little'))
t = utime.ticks_us()
i1000xCount32a()
delta = utime.ticks_diff(utime.ticks_us(), t)
print('After: ', int.from_bytes(gv, 'little'))
print('Function count32a took {:5.2f}us'.format(delta/1000.0))

print('\n---------------------------------------')

print('Before: ', gb)
t = utime.ticks_us()
i1000xCount32b()
delta = utime.ticks_diff(utime.ticks_us(), t)
print('After: ', gb)
print('Function count32b took {:5.2f}us'.format(delta/1000.0))

I got 4.14 us for the count32a() function call, and 5.72 us for count32b().
Most expensive (> 90%) is the function call itself. The increment takes only several nanoseconds.
But this cannot be improved (by us) as long as the function call involves wrapping up all fast instructions in a python compatible way.
With 32768 Hz you have 30.5 us between the interrupting flanks. So there should be enough time to process the interrupts. Something else must have been in the way of the missing 2500 or so interrupts.

I will try to repeat your experiments with my blackpill board and the timer device. Can you post the code for programming the DS3231?
Probably with the stm chips the interrupts will be faster.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Pulse counter performance limit

Post by Roberthh » Sun Feb 06, 2022 8:52 am

With ESP8266, ESP32 and generally all MCUs where the code is stored in SPI flash memory, you can always have cache misses during coded execution. Then code has to fetched from SPI flash. For ESP8266, this takes 200-300µs. That excludes reliable realtime performance.

rkompass
Posts: 66
Joined: Fri Sep 17, 2021 8:25 pm

Re: Pulse counter performance limit

Post by rkompass » Sun Feb 06, 2022 10:46 am

O.k., then it does not make sense to hope for hard interrupts.

Post Reply