[Any STM32 board] How much time this loop should take ?

Discussion and questions about boards that can run MicroPython but don't have a dedicated forum.
Target audience: Everyone interested in running MicroPython on other hardware.
User avatar
shazz
Posts: 46
Joined: Tue Apr 30, 2019 6:35 pm
Contact:

[Any STM32 board] How much time this loop should take ?

Post by shazz » Thu May 23, 2019 1:01 am

Hi,

I' m debugging weird clock settings on my board and I was running some performance tests.
I added a C module doing :

Code: Select all

#define NOP asm volatile(" nop \n\t")
STATIC mp_obj_t perf_count(size_t n_args, const mp_obj_t *args) {
    (void)n_args;

    uint32_t start = HAL_GetTick();  
    uint32_t count = 0;  
    uint32_t end = 0; 

    for(count=0; count<10000000; count++) { NOP; }

    end = HAL_GetTick();  

    return mp_obj_new_int(end - start);
}
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(perf_count_obj, 0, 2, perf_count);
This loop takes beetwen 477 and 715 ticks depending of the sysclk frequency (84Mhz and 56 Mhz if well set but I doubt)
so that's about 21Mhz at best right ?

I don't know the ARM Cortex M4 instructions timings (and not sure when to find them for the STM32F4) and not sure also which instructions gcc will generate for this loop ? Something like this ?

Code: Select all

    movw r0, 0xffff
    movt r0, 0xffff
loop:
    sub r0, r0, #1
    cmp r0, #0
    bhi loop
Or said differently, is there a way to check that when sysclk is set at 84MHz, that's really 84MHz (I double the HSE value is right)

Thanks
8bits should be enough...

User avatar
jimmo
Posts: 1837
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: [Any STM32 board] How much time this loop should take ?

Post by jimmo » Thu May 23, 2019 1:22 am

shazz wrote:
Thu May 23, 2019 1:01 am
... and not sure also which instructions gcc will generate for this loop ?
arm-none-eabi-objdump is your friend!

User avatar
jimmo
Posts: 1837
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: [Any STM32 board] How much time this loop should take ?

Post by jimmo » Thu May 23, 2019 1:42 am

shazz wrote:
Thu May 23, 2019 1:01 am
Or said differently, is there a way to check that when sysclk is set at 84MHz, that's really 84MHz (I double the HSE value is right)
Do you have a scope or signal analyser?

Sorry it's been a while since I've done this (and can't remember if it was for sysclk) but you should be able to enable the MCO AF on a pin and connect that to the clock.

OutoftheBOTS_
Posts: 790
Joined: Mon Nov 20, 2017 10:18 am

Re: [Any STM32 board] How much time this loop should take ?

Post by OutoftheBOTS_ » Thu May 23, 2019 4:03 am

You can output any of the clocks to the MCO pins see attached pic of the setup for the STM32F4 then read them with a scope :)
mco.JPG
mco.JPG (32.69 KiB) Viewed 1222 times

chuckbook
Posts: 125
Joined: Fri Oct 30, 2015 11:55 pm

Re: [Any STM32 board] How much time this loop should take ?

Post by chuckbook » Thu May 23, 2019 8:32 am

The loop of the asm example takes 4 cycles.
This will result in 4e7 cycles (with 10e7 passes).
At 84 MHz I would expect ~476 ms.

User avatar
shazz
Posts: 46
Joined: Tue Apr 30, 2019 6:35 pm
Contact:

Re: [Any STM32 board] How much time this loop should take ?

Post by shazz » Fri May 24, 2019 12:19 am

Thanks Chuckbook, OutoftheBOTS, and Jimmo

my logical analyzer doesn't have enough bandwidth I presume...So objdump will do :)

gcc is smater than me:

Code: Select all

00000000 <perf_count>:
   0:   b510            push    {r4, lr}
   2:   f7ff fffe       bl      0 <HAL_GetTick>
   6:   4b06            ldr     r3, [pc, #24]   ; (20 <perf_count+0x20>)
   8:   4604            mov     r4, r0
  -------------------------------- 
   a:   bf00            nop
   c:   3b01            subs    r3, #1
   e:   d1fc            bne.n   a <perf_count+0xa>
  --------------------------------    
  10:   f7ff fffe       bl      0 <HAL_GetTick>
  14:   1b00            subs    r0, r0, r4
  16:   e8bd 4010       ldmia.w sp!, {r4, lr}
  1a:   f7ff bffe       b.w     0 <mp_obj_new_int>
  1e:   bf00            nop
  20:   00989680        b.l     10000000
so the loop takes 1+1+2 so still 4 cycles but NOP included. So my results totally make sense:
  • 477ms at 84MHz (10 017 000 loops)
  • 556ms at 72MHz (10 008 000 loops)
  • 715ms at 56MHz (10 010 000 loops)
Then... I don' t understand why the USB OTG only works when set at 32MHz (and not 48MHz as said in the source code):

Code: Select all

// HSE is 8MHz
#define MICROPY_HW_CLK_PLLM (12)
#define MICROPY_HW_CLK_PLLN (336)
#define MICROPY_HW_CLK_PLLP (RCC_PLLP_DIV4)
#define MICROPY_HW_CLK_PLLQ (7)
Meaning:

Code: Select all

VCO=HSE*PLLN/PLLM=224
USB, OTG, SDIO, RNG=VCO/PLLQ=32
CPU-VCO/PLLP=56
8bits should be enough...

User avatar
jimmo
Posts: 1837
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: [Any STM32 board] How much time this loop should take ?

Post by jimmo » Fri May 24, 2019 12:45 am

I find the STM32CubeMX tool pretty handy for doing the clock calculations and some of this initial config stuff.

Here's a screenshot of your config:
Screenshot_2019-05-24_10-41-51.png
Screenshot_2019-05-24_10-41-51.png (73.59 KiB) Viewed 1185 times
The error info for the /M node is "PLLM output frequency is currently set to 0.666666 MHz. Must be >0.95 MHz and < 2.1 MHz"

The tool suggests (for 8MHz HSE and 48MHz USB):

Code: Select all

/M  4
*N  72
/P  4
/Q   3

User avatar
shazz
Posts: 46
Joined: Tue Apr 30, 2019 6:35 pm
Contact:

Re: [Any STM32 board] How much time this loop should take ?

Post by shazz » Fri May 24, 2019 3:01 am

Ah !!! I did not know this tool! I spent hours on an speadsheet to do something similar (but not as good for sure...)

Nice tool, I updated the config to have the good APBx prescaler and SYSCLK:
clock.png
clock.png (57.37 KiB) Viewed 1182 times
Same "warning" but that' s so weird as the board works and the USB too with those settings.
to fix it, nn my case it proposes:

Code: Select all

M:4, N:72, P:2, Q:3 => USB:48, SYSCLK=72
I tried this fix and other settings which set the USB to 48MHz as it should be but doesn't work.

The only settings I found working (with the USB) are:

Code: Select all

M: 12, N: 336, P:4, Q:7
M: 8, N: 288, P:4, Q:9
see below, same USB freq: 32MHz...
clock2.png
clock2.png (57.28 KiB) Viewed 1182 times
So I thought the HSE was wrong (with HSE=12MHz, it would generate a 48MHz USB and set all max values!)
clockHSE12.png
clockHSE12.png (57.49 KiB) Viewed 1182 times
That's why I ran those performance tests but it seems HSE is really 8 MHz..

I'm lost...
8bits should be enough...

User avatar
dhylands
Posts: 3454
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Re: [Any STM32 board] How much time this loop should take ?

Post by dhylands » Fri May 24, 2019 4:46 am

Did you take flash latency into consideration, since the code is executing from flash?

Is your code available someplace (i.e github) that I can build for a 401 board with a known HSE crystal and compare results?

User avatar
jimmo
Posts: 1837
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia
Contact:

Re: [Any STM32 board] How much time this loop should take ?

Post by jimmo » Fri May 24, 2019 5:14 am

A scope would be super handy :) Seriously, best addition to my workbench. Think about how many hours you've spend on this already.
I got a Rigol 1054Z a few years back (yes I watch a lot of EEVBlog), it's entry-level but I love it. I also have the logic analyser recommended in this video https://www.youtube.com/watch?v=xZ5wKYnCNcs works great with sigrok/pulseview, would recommend too. Both have been invaluable.

It does seem possible that HSE might actually be 12MHz. Like why else would Meowbit have chosen those crazy numbers in their config? (They're completely wrong for 8MHz)

Out of curiosity, have you tried this config (72MHz sysclk)

Code: Select all

HSE    12 MHz
M  /6
N   *72
P   /2
Q   /3
Also, maybe a bit of a long shot, but back to the MCO idea -- you said "my logical analyzer doesn't have enough bandwidth I presume". What do you have? Can you set the divider down low enough? Like you just need a pulse counter? Any other dev boards sitting around?

Post Reply