Page 1 of 2
[Any STM32 board] How much time this loop should take ?
Posted: Thu May 23, 2019 1:01 am
by shazz
Hi,
I' m debugging weird clock settings on my board and I was running some performance tests.
I added a C module doing :
Code: Select all
#define NOP asm volatile(" nop \n\t")
STATIC mp_obj_t perf_count(size_t n_args, const mp_obj_t *args) {
(void)n_args;
uint32_t start = HAL_GetTick();
uint32_t count = 0;
uint32_t end = 0;
for(count=0; count<10000000; count++) { NOP; }
end = HAL_GetTick();
return mp_obj_new_int(end - start);
}
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(perf_count_obj, 0, 2, perf_count);
This loop takes beetwen 477 and 715 ticks depending of the sysclk frequency (84Mhz and 56 Mhz if well set but I doubt)
so that's about 21Mhz at best right ?
I don't know the ARM Cortex M4 instructions timings (and not sure when to find them for the STM32F4) and not sure also which instructions gcc will generate for this loop ? Something like this ?
Code: Select all
movw r0, 0xffff
movt r0, 0xffff
loop:
sub r0, r0, #1
cmp r0, #0
bhi loop
Or said differently, is there a way to check that when sysclk is set at 84MHz, that's really 84MHz (I double the HSE value is right)
Thanks
Re: [Any STM32 board] How much time this loop should take ?
Posted: Thu May 23, 2019 1:22 am
by jimmo
shazz wrote: ↑Thu May 23, 2019 1:01 am
... and not sure also which instructions gcc will generate for this loop ?
arm-none-eabi-objdump is your friend!
Re: [Any STM32 board] How much time this loop should take ?
Posted: Thu May 23, 2019 1:42 am
by jimmo
shazz wrote: ↑Thu May 23, 2019 1:01 am
Or said differently, is there a way to check that when sysclk is set at 84MHz, that's really 84MHz (I double the HSE value is right)
Do you have a scope or signal analyser?
Sorry it's been a while since I've done this (and can't remember if it was for sysclk) but you should be able to enable the MCO AF on a pin and connect that to the clock.
Re: [Any STM32 board] How much time this loop should take ?
Posted: Thu May 23, 2019 4:03 am
by OutoftheBOTS_
You can output any of the clocks to the MCO pins see attached pic of the setup for the STM32F4 then read them with a scope
- mco.JPG (32.69 KiB) Viewed 5401 times
Re: [Any STM32 board] How much time this loop should take ?
Posted: Thu May 23, 2019 8:32 am
by chuckbook
The loop of the asm example takes 4 cycles.
This will result in 4e7 cycles (with 10e7 passes).
At 84 MHz I would expect ~476 ms.
Re: [Any STM32 board] How much time this loop should take ?
Posted: Fri May 24, 2019 12:19 am
by shazz
Thanks Chuckbook, OutoftheBOTS, and Jimmo
my logical analyzer doesn't have enough bandwidth I presume...So objdump will do
gcc is smater than me:
Code: Select all
00000000 <perf_count>:
0: b510 push {r4, lr}
2: f7ff fffe bl 0 <HAL_GetTick>
6: 4b06 ldr r3, [pc, #24] ; (20 <perf_count+0x20>)
8: 4604 mov r4, r0
--------------------------------
a: bf00 nop
c: 3b01 subs r3, #1
e: d1fc bne.n a <perf_count+0xa>
--------------------------------
10: f7ff fffe bl 0 <HAL_GetTick>
14: 1b00 subs r0, r0, r4
16: e8bd 4010 ldmia.w sp!, {r4, lr}
1a: f7ff bffe b.w 0 <mp_obj_new_int>
1e: bf00 nop
20: 00989680 b.l 10000000
so the loop takes 1+1+2 so still 4 cycles but NOP included. So my results totally make sense:
- 477ms at 84MHz (10 017 000 loops)
- 556ms at 72MHz (10 008 000 loops)
- 715ms at 56MHz (10 010 000 loops)
Then... I don' t understand why the USB OTG only works when set at 32MHz (and not 48MHz as said in the
source code):
Code: Select all
// HSE is 8MHz
#define MICROPY_HW_CLK_PLLM (12)
#define MICROPY_HW_CLK_PLLN (336)
#define MICROPY_HW_CLK_PLLP (RCC_PLLP_DIV4)
#define MICROPY_HW_CLK_PLLQ (7)
Meaning:
Code: Select all
VCO=HSE*PLLN/PLLM=224
USB, OTG, SDIO, RNG=VCO/PLLQ=32
CPU-VCO/PLLP=56
Re: [Any STM32 board] How much time this loop should take ?
Posted: Fri May 24, 2019 12:45 am
by jimmo
I find the STM32CubeMX tool pretty handy for doing the clock calculations and some of this initial config stuff.
Here's a screenshot of your config:
- Screenshot_2019-05-24_10-41-51.png (73.59 KiB) Viewed 5364 times
The error info for the /M node is "PLLM output frequency is currently set to 0.666666 MHz. Must be >0.95 MHz and < 2.1 MHz"
The tool suggests (for 8MHz HSE and 48MHz USB):
Re: [Any STM32 board] How much time this loop should take ?
Posted: Fri May 24, 2019 3:01 am
by shazz
Ah !!! I did not know this tool! I spent hours on an speadsheet to do something similar (but not as good for sure...)
Nice tool, I updated the config to have the good APBx prescaler and SYSCLK:
- clock.png (57.37 KiB) Viewed 5361 times
Same "warning" but that' s so weird as the board works and the USB too with those settings.
to fix it, nn my case it proposes:
Code: Select all
M:4, N:72, P:2, Q:3 => USB:48, SYSCLK=72
I tried this fix and other settings which set the USB to 48MHz as it should be but doesn't work.
The only settings I found working (with the USB) are:
Code: Select all
M: 12, N: 336, P:4, Q:7
M: 8, N: 288, P:4, Q:9
see below, same USB freq: 32MHz...
- clock2.png (57.28 KiB) Viewed 5361 times
So I thought the HSE was wrong (with HSE=12MHz, it would generate a 48MHz USB and set all max values!)
- clockHSE12.png (57.49 KiB) Viewed 5361 times
That's why I ran those performance tests but it seems HSE is really 8 MHz..
I'm lost...
Re: [Any STM32 board] How much time this loop should take ?
Posted: Fri May 24, 2019 4:46 am
by dhylands
Did you take flash latency into consideration, since the code is executing from flash?
Is your code available someplace (i.e github) that I can build for a 401 board with a known HSE crystal and compare results?
Re: [Any STM32 board] How much time this loop should take ?
Posted: Fri May 24, 2019 5:14 am
by jimmo
A scope would be super handy
Seriously, best addition to my workbench. Think about how many hours you've spend on this already.
I got a Rigol 1054Z a few years back (yes I watch a lot of EEVBlog), it's entry-level but I love it. I also have the logic analyser recommended in this video
https://www.youtube.com/watch?v=xZ5wKYnCNcs works great with sigrok/pulseview, would recommend too. Both have been invaluable.
It does seem possible that HSE might actually be 12MHz. Like why else would Meowbit have chosen those crazy numbers in their config? (They're completely wrong for 8MHz)
Out of curiosity, have you tried this config (72MHz sysclk)
Also, maybe a bit of a long shot, but back to the MCO idea -- you said "my logical analyzer doesn't have enough bandwidth I presume". What do you have? Can you set the divider down low enough? Like you just need a pulse counter? Any other dev boards sitting around?