Pin Toggle Frequency Contest against C. Please Help! :)

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
mad474
Posts: 60
Joined: Sun Dec 29, 2013 7:48 pm

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by mad474 » Wed Feb 03, 2016 4:00 pm

Am I understanding this correctly? The 21MHz toggle assertion isn't confirmed yet? Dave, have you got the setup to check that C-Program? Sorry, but I haven't.

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by dhylands » Wed Feb 03, 2016 4:07 pm

So far, I haven't been able to confirm it. gcc doesn't seem to produce code which would go that fast (I even tried -O3). However, that's not to say that one of the other compilers can't do it (gcc isn't known for producing the tightest code - especially on ARM). From what I've heard, IAR does a better job, but I don't have anything other than gcc installed.

mad474
Posts: 60
Joined: Sun Dec 29, 2013 7:48 pm

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by mad474 » Wed Feb 03, 2016 4:16 pm

Thanks Dave! That guy claims

Code: Select all

Auf dem STM32F407VG Discovery Board ausgeführt, bei 168MHz Taktfrequenz, 
mit GCC und -Os kompiliert.
Mit dem Debugger "result" ausgelesen: 21000001
which is self explaining. If nobody stops me here I'm going to ask for (accountable) confirmation over there.

User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by dhylands » Wed Feb 03, 2016 5:08 pm

I think it's plausible. Getting the .elf file would allow disassembly.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by Roberthh » Wed Feb 03, 2016 9:00 pm

Hello Folks, since this is all very interesting, I did a test. I took the function togglePerformance3D and hooked up an oscilloscope to red LED: It shows a period of 298 ns and a frequency of 3,356 MHz. The output of the script is:.

Counted: 6,700,168 (viper4) (time=597ms)

where the time and the 2 million cycle count match the 3.3 MHz measurement. Pyboard runs at it's default settings, pyb.freq() shows 168 MHz.
It#s a Pyboard V1.0.
For the first Assembler version I get a period time of 166 ns or 6 MHz.
The second Assembler version did not change the level at the LED, so I could not take a sample.
The factor of 2 should result from the way of counting. In each period the LED is toggled twice (On and Off).

Update: I've just seen why the second assembler version did not toggle the led. In the seccond instructon to switch the output, the offset should be 2. For timing purposes, this is irrelevant, but on a scope there is no change.
Last edited by Roberthh on Thu Feb 04, 2016 9:13 am, edited 1 time in total.

mad474
Posts: 60
Joined: Sun Dec 29, 2013 7:48 pm

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by mad474 » Wed Feb 03, 2016 9:20 pm

Yay! Thanks Roberthh! Was too shy to ask for a scope validation. All I can say is that I get about the same output running togglePerformance3d(). But my mechanical stopwatch isn't really useful for this sort of time measurement :D

chuckbook
Posts: 135
Joined: Fri Oct 30, 2015 11:55 pm

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by chuckbook » Wed Feb 03, 2016 9:52 pm

@dhylands, just out of curiosity. Did you try the asm test on a STM32F7xx board? I got 72MHz on PA13 at 216 MHz core freq.
Can you confirm that? Thanks!

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by Roberthh » Thu Feb 04, 2016 7:26 am

With a slightly changed loopPerformance3d() the results are a little bit better, using Port A0. I get

Counted: 3,565,062 (viper4) (time=561ms)

which would be 7.130.124 toggles. For comparison, I made a single endless loop toggling X1 (Port A0). The loop period is 82 ns or 12 MHz, the shortest time between slopes is 23 ns. In May last year, as part of the trial that brought me to PyBoard, I made the same tests with varios board, just to see how fast & short pulses can be generated. The endless loop test on a Teensy3.1 resulted in 53 ns period time. The code looked like C, but the Macros in the Teensy package generate direct port instructions out of that. I had to add a "DSB" instruction between the two port commands to allow the I/O bus to settle. Maybe that instruction could be added to Micropython assembly. I saw that dhylands added an instruction between the two port commands, maybe for that purpose.

Code: Select all

@micropython.viper
def togglePerformance3d():
    x1_pin = pyb.Pin('X1', pyb.Pin.OUT_PP)
    bsrrl = ptr16(stm.GPIOA + stm.GPIO_BSRRL)
    bsrrh = ptr16(stm.GPIOA + stm.GPIO_BSRRH)
    start = pyb.millis()
    for _ in range(2000000):
        bsrrl[0] = 1
        bsrrh[0] = 1
    time = pyb.elapsed_millis(start)
    count = round(2e9 / time) # 2 toggles per iteration
    print('Counted: {:10,} (viper4) (time={}ms)'.format(count, time))

@micropython.viper
def loopPerformance():
    x1_pin = pyb.Pin('X1', pyb.Pin.OUT_PP)
    bsrrl = ptr16(stm.GPIOA + stm.GPIO_BSRRL)
    bsrrh = ptr16(stm.GPIOA + stm.GPIO_BSRRH)
    while True:
        bsrrl[0] = 1
        bsrrh[0] = 1
I use A0, so I can simply use 1 instead of 1 << 13. The bsrrl[0] = 1 causes the rising slope, so I have to check the definitions of the constants, or the data sheet. Maybe the output is inverted. For the moment, this is not important. The interrupts are still enabled, which add a little bit of jitter, but allows to stop the code with Ctrl-C.
Update: I just looked into the CPU manuals (last resort). Since the processor uses a little endian format the definitions of BSSRL and BSSRH have to be swapped. The lower address sets the bit.
Update 2: The loop above written in assembly give a period of 59 ns or 16.9 MHz, and a high pulse of 23 ns. The slope time are below 2 ns. So no idea where the 84 MHz claim of the data sheet comes from. Here's the endless loop code. X1 was set to output. No bus barrier instruction required.

Code: Select all

@micropython.asm_thumb
def loop():    # r0 has address of run_buf
    movwt(r1, stm.GPIOA)    # Use A0
    add(r1, stm.GPIO_BSRRL)
    movw(r2, 1)       # r2 has mask for setting A0

# loop
    label(loop)
    strb(r2, [r1, 0])  # output high
    strb(r2, [r1, 2])  # output low
    b(loop)
# endloop

Damien
Site Admin
Posts: 647
Joined: Mon Dec 09, 2013 5:02 pm

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by Damien » Thu Feb 04, 2016 11:37 am

I use A0, so I can simply use 1 instead of 1 << 13
The "1 << 13" should be optimised by the compiler to a single constant (although loading 0x1000 into a register might take more instructions than simply loading 0x1).

Maybe the IO can go faster if the peripheral bus frequency is increased (if it's not already at its maximum)?

@Roberthh, regarding your question about viper docs: there are none at the moment, and this forum topic is probably one of the better places to start learning! You can look at the tests in tests/micropython/viper* to see some of the features that viper has.

I have intentionally not provided docs for viper mode because it's still work-in-progress and hence may change semantics. I don't want people to rely on it just yet. But it's really important to test viper and get ideas for making it better, so that's why I'm taking an interest in this discussion.

User avatar
Roberthh
Posts: 3667
Joined: Sat May 09, 2015 4:13 pm
Location: Rhineland, Europe

Re: Pin Toggle Frequency Contest against C. Please Help! :)

Post by Roberthh » Thu Feb 04, 2016 12:26 pm

Thanks. I'm very exited about this discussion, because it gives very helpful information for two applications of PyBoard, where I thought it would be too slow. But the viper mode and direct addressing of peripheral registers is just the right combination of speed and code readability.

Post Reply