First, I'll start out with the original Python code that I was trying to optimzie:
Code: Select all
def write_packet:
stm.mem16[self.cr1_addr] &= ~0x04
self.uart.write(packet_data)
stm.mem16[self.cr1_addr] |= 0x04
Code: Select all
import pyb
import stm
@micropython.asm_thumb
def _write_packet(r0, r1, r2): # uart(r0) buf(r1) len(r2)
# Disable the Receiver
ldr(r3, [r0, stm.USART_CR1]) # uart->CR1 &= ~USART_CR1_RE
mov(r4, 0x04) #
bic(r3, r4) #
str(r3, [r0, stm.USART_CR1]) #
add(r2, r2, r1) # buf_end(r2) = &buf(r1)[len(r2)]
sub(r2, 1) # buf_end--
# loop
label(loop)
cmp(r1, r2)
bhi(endloop) # branch if buf > buf_end
# Wait for the Transmit Data Register to be Empty
mov(r4, 0x80) # while ((uart->SR & USART_SR_TXE) == 0) {
# wait_txe # ;
label(wait_txe) #
ldr(r3, [r0, stm.USART_SR]) #
tst(r3, r4) #
beq(wait_txe) # }
# Disable interrupts from the time that we write the last character
# until the tx complete bit is set. This ensures that we re-enable
# the Rx as soon as possible after the last character has left
cmp(r1, r2)
bne(write_dr) # if buf == buf_end
cpsid(i) # disable_irq
# write_dr
label(write_dr)
# Write one byte to the UART
ldrb(r3, [r1, 0]) # uart->DR = *buf++
add(r1, 1) #
str(r3, [r0, stm.USART_DR]) #
b(loop)
# endloop
label(endloop)
# Wait for Transmit Complete (i.e the last bit of transmitted data has left the shift register)
mov(r4, 0x40) # while ((uart->SR & USART_SR_TC) == 0) {
# wait_tx_complete # ;
label(wait_tx_complete) #
ldr(r3, [r0, stm.USART_SR]) #
tst(r3, r4) #
beq(wait_txe) # }
# Re-enable the receiver
ldr(r3, [r0, stm.USART_CR1]) # uart->CR1 |= USART_CR1_RE
mov(r4, 0x04) #
orr(r3, r4) #
str(r3, [r0, stm.USART_CR1]) #
cpsie(i) # enable_irq
def test():
uart = pyb.UART(6, 1000000)
buf = bytearray(b'123456')
_write_packet(stm.USART6, buf, len(buf))
test()
The first thing to do is to enable hard fault reporting in MicroPython. You'll need to edit the stmhal/stm32_it.c file. Around line 93 you'll find a line that says:
Code: Select all
#define REPORT_HARD_FAULT_REGS 0
For this example, I connected an FTDI adapter to UART 4. I also added this line:
Code: Select all
pyb.repl_uart(pyb.UART(4, 115200))
Code: Select all
R0 c0011400
R1 20003610
R2 00000006
R3 00000001
R12 0802efad
LR 0802f01b
PC 20004c52
XPSR 21000000
HFSR 40000000
CFSR 00008200
BFAR c001140c
FATAL ERROR:
HardFault
Damien provided me with a nifty little snippet of Python which could be used to determine the address in RAM of the inline assembler function. I then modified it slightly to have it print out some python code which when run on the host creates a binary file with the opcodes for the inline assembler. Add the following inspect function to the test code:
Code: Select all
def inspect(f, nbytes=16):
import stm
import array
import ubinascii
@micropython.asm_thumb
def dummy():
pass
if type(f) != type(dummy):
raise ValueError('expecting an inline-assembler function')
baddr = bytes(array.array('O', [f]))
addr = baddr[0] | baddr[1] << 8 | baddr[2] << 16 | baddr[3] << 24
print('function object at: 0x%08x' % addr)
print('number of args: %u' % stm.mem32[addr + 4])
code_addr = stm.mem32[addr + 8]
print('machine code at: 0x%08x' % code_addr)
print('----------')
print('import binascii')
print("with open('code.bin', 'wb') as f:")
import ubinascii
hex_str = ubinascii.hexlify(bytearray([stm.mem8[code_addr + i] for i in range(nbytes)]))
print(" f.write(binascii.unhexlify(%s))" % hex_str)
print('----------')
Code: Select all
inspect(_write_packet, 64)
Now I get the following report when the hardfault occurs (the backslash was added manually by me to try and keep the width of this post reasonable):
Code: Select all
function object at: 0x20003500
number of args: 3
machine code at: 0x200068f0
----------
import binascii
with open('code.bin', 'wb') as f:
f.write(binascii.unhexlify(b'f2b5c3680424a343c3605218013a91420ad8802403682342fcd0914200d172b6\
0b7801314360f2e7402403682342f1d0c36804242343c36062b6f2bd00000000'))
----------
R0 c0011400
R1 20003cc0
R2 00000006
R3 00000001
R12 0802efad
LR 0802f01b
PC 200068f2
XPSR 21000000
HFSR 40000000
CFSR 00008200
BFAR c001140c
FATAL ERROR:
HardFault
I then ran the ARM disassembler on that file using the command:
Code: Select all
arm-none-eabi-objdump -bbinary -marm --disassemble-all code.bin -Mforce-thumb
Code: Select all
code.bin: file format binary
Disassembly of section .data:
00000000 <.data>:
0: b5f2 push {r1, r4, r5, r6, r7, lr}
2: 68c3 ldr r3, [r0, #12]
4: 2404 movs r4, #4
6: 43a3 bics r3, r4
8: 60c3 str r3, [r0, #12]
a: 1852 adds r2, r2, r1
c: 3a01 subs r2, #1
e: 4291 cmp r1, r2
10: d80a bhi.n 0x28
12: 2480 movs r4, #128 ; 0x80
14: 6803 ldr r3, [r0, #0]
16: 4223 tst r3, r4
18: d0fc beq.n 0x14
1a: 4291 cmp r1, r2
1c: d100 bne.n 0x20
1e: b672 cpsid i
20: 780b ldrb r3, [r1, #0]
22: 3101 adds r1, #1
24: 6043 str r3, [r0, #4]
26: e7f2 b.n 0xe
28: 2440 movs r4, #64 ; 0x40
2a: 6803 ldr r3, [r0, #0]
2c: 4223 tst r3, r4
2e: d0f1 beq.n 0x14
30: 68c3 ldr r3, [r0, #12]
32: 2404 movs r4, #4
34: 4323 orrs r3, r4
36: 60c3 str r3, [r0, #12]
38: b662 cpsie i
3a: bdf2 pop {r1, r4, r5, r6, r7, pc}
3c: 0000 movs r0, r0
...
If we take the PC whre the hard fault occurred: 200068f2 and subtract the value printed on the 'machine code at: 0x200068f0' line (immediately before the first line of dashes) we get an offset of 2. Looking at the disassembly listing this means it crashed on the very first line of the function:
Code: Select all
ldr r3, [r0, #12]
The hardfault was caused by trying to access memory location c001140c, and this is also confirmed by the contents of the BFAR (Bus Fault Address Register)
r0 is the first argument to the _write_packet function, which is supposed to be the address of the USART6 peripheral, which is 40011400.
Ah-ha. MicroPython passed the value as a small int which means that bits 30 and 31 always the same, so we need to modify our routine to mask away that first bit.
Note: MicroPython has since been fixed so that it can now pass in full 32-bit integers to inline assembler functions, and these extra lines of masking are no longer required. At the time I added these lines to the beginning of my inline assembler function:
Code: Select all
movw(r3, 0xffff) # uart(r0) &= 0x7fffffff
movt(r3, 0x7fff) #
and_(r0, r3) #
Code: Select all
function object at: 0x20003500
number of args: 3
machine code at: 0x20006940
----------
import binascii
with open('code.bin', 'wb') as f:
f.write(binascii.unhexlify(b'f2b54ff6ff73c7f6ff731840c3680424a343c3605218013a91420ad8802403682342fcd0914200d172b6\
0b7801314360f2e7402403682342f1d0c36804242343'))
----------
R0 40011400
R1 20020000
R2 20003ac5
R3 000000c0
R12 0802efad
LR 0802f01b
PC 2000696a
XPSR 21000000
HFSR 40000000
CFSR 00008200
BFAR 20020000
FATAL ERROR:
HardFault
Code: Select all
26: d100 bne.n 0x2a
28: b672 cpsid i
2a: 780b ldrb r3, [r1, #0]
2c: 3101 adds r1, #1
2e: 6043 str r3, [r0, #4]
After looking at the assembler some more, I noticed that I had a copy/paste error at the end of my wait_tx_complete loop. It had a branch to wait_txe rather than wait_tx_complete. I fixed that and the routine now works properly.
The _write_packet routine is part of some code I've been working on for controlling bioloid servos being used to control my six legged walker (or my brother's 4 legged walker named Roz: http://forum.micropython.org/viewtopic. ... &hilit=Roz )
The reason I needed to write the inline assembler at all was because when running on the Espruino Pico (which only runs at 84 MHz) the
time delay between the uart.write and the re-enabling of the receiver was taking too long and I was missing the response packet from the servo. Using the inline assembler routine I never miss a response packet.