Debugging Hard Faults in Inline Assembler (Long)
Posted: Sat Jan 09, 2016 8:49 pm
I recently wrote some inline assembler code, and the first time I ran it, I ran into some Hard Faults. Since some of the techniques for finding the problems in my code seemed useful, I thought I would write up an article that describes them.
First, I'll start out with the original Python code that I was trying to optimzie:
My first cut at the inline assembler code looked something like this: and when I tried to run it, I get an immediate hard fault (all of the LEDs on the pyboard turn on).
The first thing to do is to enable hard fault reporting in MicroPython. You'll need to edit the stmhal/stm32_it.c file. Around line 93 you'll find a line that says: Change the 0 to 1, and then rebuild and reflash the firmware. Since the processor more or less shuts down as soon as the hard fault occurs, you'll also need to hook up an external UART device in order to capture the information printed by the hard fault handler.
For this example, I connected an FTDI adapter to UART 4. I also added this line: to the beginning of the test function above. Now I get the following fault reported on UART 4:
The PC register shows that the fault occurred when the program counter was equal to 20004c52. Since the inline assembler code will have been allocated from the heap, this all seems reasonable. But how to correlate that address back to my inline assembler function?
Damien provided me with a nifty little snippet of Python which could be used to determine the address in RAM of the inline assembler function. I then modified it slightly to have it print out some python code which when run on the host creates a binary file with the opcodes for the inline assembler. Add the following inspect function to the test code: and add the following to the beginning of the test function:
You can get a resaonable guess as to the size of your inline assembler by counting the number of opcodes (not including labels) and multiplying by 2. I then round up to the next multiple of 16 (I counted 28 opcodes = 28 * 2 = 56, which came out to 64 when I rounded up).
Now I get the following report when the hardfault occurs (the backslash was added manually by me to try and keep the width of this post reasonable):
If you copy the lines between the dashes into a new python file on your host machine and execute it, then it will produce a file named code.bin.
I then ran the ARM disassembler on that file using the command: and it will generate this output: which correlates exactly with the inline assembler we wrote. MicroPython added a push instruction at the beginning and a pop instruction at the end, but the rest is from the inline assembler.
If we take the PC whre the hard fault occurred: 200068f2 and subtract the value printed on the 'machine code at: 0x200068f0' line (immediately before the first line of dashes) we get an offset of 2. Looking at the disassembly listing this means it crashed on the very first line of the function: which reads from the memory r0 + 12 and stores it into r3. Fortunately, the hard fault dump includes registers R0 to R3, and we see that R0 has the value: c0011400
The hardfault was caused by trying to access memory location c001140c, and this is also confirmed by the contents of the BFAR (Bus Fault Address Register)
r0 is the first argument to the _write_packet function, which is supposed to be the address of the USART6 peripheral, which is 40011400.
Ah-ha. MicroPython passed the value as a small int which means that bits 30 and 31 always the same, so we need to modify our routine to mask away that first bit.
Note: MicroPython has since been fixed so that it can now pass in full 32-bit integers to inline assembler functions, and these extra lines of masking are no longer required. At the time I added these lines to the beginning of my inline assembler function: and reran it. This caused another hard fault to occur:
Now we have a PC of 2000696a and machine code address of 0x20006940 yielding an offset of 0x2a:
This time the BFAR shows 20020000 which is 1 byte beyond the end of RAM, and R1 also contains 20020000. Somehow the code didn't stop at the right place and kept right zipping through memory until it tried to access beyond the end of RAM.
After looking at the assembler some more, I noticed that I had a copy/paste error at the end of my wait_tx_complete loop. It had a branch to wait_txe rather than wait_tx_complete. I fixed that and the routine now works properly.
The _write_packet routine is part of some code I've been working on for controlling bioloid servos being used to control my six legged walker (or my brother's 4 legged walker named Roz: http://forum.micropython.org/viewtopic. ... &hilit=Roz )
The reason I needed to write the inline assembler at all was because when running on the Espruino Pico (which only runs at 84 MHz) the
time delay between the uart.write and the re-enabling of the receiver was taking too long and I was missing the response packet from the servo. Using the inline assembler routine I never miss a response packet.
First, I'll start out with the original Python code that I was trying to optimzie:
Code: Select all
def write_packet:
stm.mem16[self.cr1_addr] &= ~0x04
self.uart.write(packet_data)
stm.mem16[self.cr1_addr] |= 0x04
Code: Select all
import pyb
import stm
@micropython.asm_thumb
def _write_packet(r0, r1, r2): # uart(r0) buf(r1) len(r2)
# Disable the Receiver
ldr(r3, [r0, stm.USART_CR1]) # uart->CR1 &= ~USART_CR1_RE
mov(r4, 0x04) #
bic(r3, r4) #
str(r3, [r0, stm.USART_CR1]) #
add(r2, r2, r1) # buf_end(r2) = &buf(r1)[len(r2)]
sub(r2, 1) # buf_end--
# loop
label(loop)
cmp(r1, r2)
bhi(endloop) # branch if buf > buf_end
# Wait for the Transmit Data Register to be Empty
mov(r4, 0x80) # while ((uart->SR & USART_SR_TXE) == 0) {
# wait_txe # ;
label(wait_txe) #
ldr(r3, [r0, stm.USART_SR]) #
tst(r3, r4) #
beq(wait_txe) # }
# Disable interrupts from the time that we write the last character
# until the tx complete bit is set. This ensures that we re-enable
# the Rx as soon as possible after the last character has left
cmp(r1, r2)
bne(write_dr) # if buf == buf_end
cpsid(i) # disable_irq
# write_dr
label(write_dr)
# Write one byte to the UART
ldrb(r3, [r1, 0]) # uart->DR = *buf++
add(r1, 1) #
str(r3, [r0, stm.USART_DR]) #
b(loop)
# endloop
label(endloop)
# Wait for Transmit Complete (i.e the last bit of transmitted data has left the shift register)
mov(r4, 0x40) # while ((uart->SR & USART_SR_TC) == 0) {
# wait_tx_complete # ;
label(wait_tx_complete) #
ldr(r3, [r0, stm.USART_SR]) #
tst(r3, r4) #
beq(wait_txe) # }
# Re-enable the receiver
ldr(r3, [r0, stm.USART_CR1]) # uart->CR1 |= USART_CR1_RE
mov(r4, 0x04) #
orr(r3, r4) #
str(r3, [r0, stm.USART_CR1]) #
cpsie(i) # enable_irq
def test():
uart = pyb.UART(6, 1000000)
buf = bytearray(b'123456')
_write_packet(stm.USART6, buf, len(buf))
test()
The first thing to do is to enable hard fault reporting in MicroPython. You'll need to edit the stmhal/stm32_it.c file. Around line 93 you'll find a line that says:
Code: Select all
#define REPORT_HARD_FAULT_REGS 0
For this example, I connected an FTDI adapter to UART 4. I also added this line:
Code: Select all
pyb.repl_uart(pyb.UART(4, 115200))
Code: Select all
R0 c0011400
R1 20003610
R2 00000006
R3 00000001
R12 0802efad
LR 0802f01b
PC 20004c52
XPSR 21000000
HFSR 40000000
CFSR 00008200
BFAR c001140c
FATAL ERROR:
HardFault
Damien provided me with a nifty little snippet of Python which could be used to determine the address in RAM of the inline assembler function. I then modified it slightly to have it print out some python code which when run on the host creates a binary file with the opcodes for the inline assembler. Add the following inspect function to the test code:
Code: Select all
def inspect(f, nbytes=16):
import stm
import array
import ubinascii
@micropython.asm_thumb
def dummy():
pass
if type(f) != type(dummy):
raise ValueError('expecting an inline-assembler function')
baddr = bytes(array.array('O', [f]))
addr = baddr[0] | baddr[1] << 8 | baddr[2] << 16 | baddr[3] << 24
print('function object at: 0x%08x' % addr)
print('number of args: %u' % stm.mem32[addr + 4])
code_addr = stm.mem32[addr + 8]
print('machine code at: 0x%08x' % code_addr)
print('----------')
print('import binascii')
print("with open('code.bin', 'wb') as f:")
import ubinascii
hex_str = ubinascii.hexlify(bytearray([stm.mem8[code_addr + i] for i in range(nbytes)]))
print(" f.write(binascii.unhexlify(%s))" % hex_str)
print('----------')
Code: Select all
inspect(_write_packet, 64)
Now I get the following report when the hardfault occurs (the backslash was added manually by me to try and keep the width of this post reasonable):
Code: Select all
function object at: 0x20003500
number of args: 3
machine code at: 0x200068f0
----------
import binascii
with open('code.bin', 'wb') as f:
f.write(binascii.unhexlify(b'f2b5c3680424a343c3605218013a91420ad8802403682342fcd0914200d172b6\
0b7801314360f2e7402403682342f1d0c36804242343c36062b6f2bd00000000'))
----------
R0 c0011400
R1 20003cc0
R2 00000006
R3 00000001
R12 0802efad
LR 0802f01b
PC 200068f2
XPSR 21000000
HFSR 40000000
CFSR 00008200
BFAR c001140c
FATAL ERROR:
HardFault
I then ran the ARM disassembler on that file using the command:
Code: Select all
arm-none-eabi-objdump -bbinary -marm --disassemble-all code.bin -Mforce-thumb
Code: Select all
code.bin: file format binary
Disassembly of section .data:
00000000 <.data>:
0: b5f2 push {r1, r4, r5, r6, r7, lr}
2: 68c3 ldr r3, [r0, #12]
4: 2404 movs r4, #4
6: 43a3 bics r3, r4
8: 60c3 str r3, [r0, #12]
a: 1852 adds r2, r2, r1
c: 3a01 subs r2, #1
e: 4291 cmp r1, r2
10: d80a bhi.n 0x28
12: 2480 movs r4, #128 ; 0x80
14: 6803 ldr r3, [r0, #0]
16: 4223 tst r3, r4
18: d0fc beq.n 0x14
1a: 4291 cmp r1, r2
1c: d100 bne.n 0x20
1e: b672 cpsid i
20: 780b ldrb r3, [r1, #0]
22: 3101 adds r1, #1
24: 6043 str r3, [r0, #4]
26: e7f2 b.n 0xe
28: 2440 movs r4, #64 ; 0x40
2a: 6803 ldr r3, [r0, #0]
2c: 4223 tst r3, r4
2e: d0f1 beq.n 0x14
30: 68c3 ldr r3, [r0, #12]
32: 2404 movs r4, #4
34: 4323 orrs r3, r4
36: 60c3 str r3, [r0, #12]
38: b662 cpsie i
3a: bdf2 pop {r1, r4, r5, r6, r7, pc}
3c: 0000 movs r0, r0
...
If we take the PC whre the hard fault occurred: 200068f2 and subtract the value printed on the 'machine code at: 0x200068f0' line (immediately before the first line of dashes) we get an offset of 2. Looking at the disassembly listing this means it crashed on the very first line of the function:
Code: Select all
ldr r3, [r0, #12]
The hardfault was caused by trying to access memory location c001140c, and this is also confirmed by the contents of the BFAR (Bus Fault Address Register)
r0 is the first argument to the _write_packet function, which is supposed to be the address of the USART6 peripheral, which is 40011400.
Ah-ha. MicroPython passed the value as a small int which means that bits 30 and 31 always the same, so we need to modify our routine to mask away that first bit.
Note: MicroPython has since been fixed so that it can now pass in full 32-bit integers to inline assembler functions, and these extra lines of masking are no longer required. At the time I added these lines to the beginning of my inline assembler function:
Code: Select all
movw(r3, 0xffff) # uart(r0) &= 0x7fffffff
movt(r3, 0x7fff) #
and_(r0, r3) #
Code: Select all
function object at: 0x20003500
number of args: 3
machine code at: 0x20006940
----------
import binascii
with open('code.bin', 'wb') as f:
f.write(binascii.unhexlify(b'f2b54ff6ff73c7f6ff731840c3680424a343c3605218013a91420ad8802403682342fcd0914200d172b6\
0b7801314360f2e7402403682342f1d0c36804242343'))
----------
R0 40011400
R1 20020000
R2 20003ac5
R3 000000c0
R12 0802efad
LR 0802f01b
PC 2000696a
XPSR 21000000
HFSR 40000000
CFSR 00008200
BFAR 20020000
FATAL ERROR:
HardFault
Code: Select all
26: d100 bne.n 0x2a
28: b672 cpsid i
2a: 780b ldrb r3, [r1, #0]
2c: 3101 adds r1, #1
2e: 6043 str r3, [r0, #4]
After looking at the assembler some more, I noticed that I had a copy/paste error at the end of my wait_tx_complete loop. It had a branch to wait_txe rather than wait_tx_complete. I fixed that and the routine now works properly.
The _write_packet routine is part of some code I've been working on for controlling bioloid servos being used to control my six legged walker (or my brother's 4 legged walker named Roz: http://forum.micropython.org/viewtopic. ... &hilit=Roz )
The reason I needed to write the inline assembler at all was because when running on the Espruino Pico (which only runs at 84 MHz) the
time delay between the uart.write and the re-enabling of the receiver was taking too long and I was missing the response packet from the servo. Using the inline assembler routine I never miss a response packet.