Debugging Hard Faults in Inline Assembler (Long)

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
User avatar
dhylands
Posts: 3821
Joined: Mon Jan 06, 2014 6:08 pm
Location: Peachland, BC, Canada
Contact:

Debugging Hard Faults in Inline Assembler (Long)

Post by dhylands » Sat Jan 09, 2016 8:49 pm

I recently wrote some inline assembler code, and the first time I ran it, I ran into some Hard Faults. Since some of the techniques for finding the problems in my code seemed useful, I thought I would write up an article that describes them.

First, I'll start out with the original Python code that I was trying to optimzie:

Code: Select all

def write_packet:
    stm.mem16[self.cr1_addr] &= ~0x04
    self.uart.write(packet_data)
    stm.mem16[self.cr1_addr] |= 0x04
My first cut at the inline assembler code looked something like this:

Code: Select all

import pyb
import stm
    
@micropython.asm_thumb
def _write_packet(r0, r1, r2):      # uart(r0) buf(r1) len(r2)

    # Disable the Receiver

    ldr(r3, [r0, stm.USART_CR1])    # uart->CR1 &= ~USART_CR1_RE
    mov(r4, 0x04)                   #
    bic(r3, r4)                     #
    str(r3, [r0, stm.USART_CR1])    #

    add(r2, r2, r1)                 # buf_end(r2) = &buf(r1)[len(r2)]
    sub(r2, 1)                      # buf_end--

# loop
    label(loop)
    cmp(r1, r2)
    bhi(endloop)                    # branch if buf > buf_end
    
    # Wait for the Transmit Data Register to be Empty

    mov(r4, 0x80)                   # while ((uart->SR & USART_SR_TXE) == 0) {
# wait_txe                          #   ;
    label(wait_txe)                 #
    ldr(r3, [r0, stm.USART_SR])     #
    tst(r3, r4)                     #
    beq(wait_txe)                   # }

    # Disable interrupts from the time that we write the last character
    # until the tx complete bit is set. This ensures that we re-enable
    # the Rx as soon as possible after the last character has left
    cmp(r1, r2)
    bne(write_dr)                   # if buf ==  buf_end
    cpsid(i)                        #   disable_irq
# write_dr
    label(write_dr)

    # Write one byte to the UART

    ldrb(r3, [r1, 0])               # uart->DR = *buf++
    add(r1, 1)                      #
    str(r3, [r0, stm.USART_DR])     #

    b(loop)
# endloop
    label(endloop)

    # Wait for Transmit Complete (i.e the last bit of transmitted data has left the shift register)

    mov(r4, 0x40)                   # while ((uart->SR & USART_SR_TC) == 0) {
# wait_tx_complete                  #   ;
    label(wait_tx_complete)         #
    ldr(r3, [r0, stm.USART_SR])     #
    tst(r3, r4)                     #
    beq(wait_txe)                   # }

    # Re-enable the receiver

    ldr(r3, [r0, stm.USART_CR1])    # uart->CR1 |= USART_CR1_RE
    mov(r4, 0x04)                   #
    orr(r3, r4)                     #
    str(r3, [r0, stm.USART_CR1])    #

    cpsie(i)                        # enable_irq

def test():
    uart = pyb.UART(6, 1000000)
    buf = bytearray(b'123456')
    _write_packet(stm.USART6, buf, len(buf))

test()
and when I tried to run it, I get an immediate hard fault (all of the LEDs on the pyboard turn on).

The first thing to do is to enable hard fault reporting in MicroPython. You'll need to edit the stmhal/stm32_it.c file. Around line 93 you'll find a line that says:

Code: Select all

#define REPORT_HARD_FAULT_REGS  0
Change the 0 to 1, and then rebuild and reflash the firmware. Since the processor more or less shuts down as soon as the hard fault occurs, you'll also need to hook up an external UART device in order to capture the information printed by the hard fault handler.

For this example, I connected an FTDI adapter to UART 4. I also added this line:

Code: Select all

    pyb.repl_uart(pyb.UART(4, 115200))
to the beginning of the test function above. Now I get the following fault reported on UART 4:

Code: Select all

R0    c0011400
R1    20003610
R2    00000006
R3    00000001
R12   0802efad
LR    0802f01b
PC    20004c52
XPSR  21000000
HFSR  40000000
CFSR  00008200
BFAR  c001140c

FATAL ERROR:
HardFault
The PC register shows that the fault occurred when the program counter was equal to 20004c52. Since the inline assembler code will have been allocated from the heap, this all seems reasonable. But how to correlate that address back to my inline assembler function?

Damien provided me with a nifty little snippet of Python which could be used to determine the address in RAM of the inline assembler function. I then modified it slightly to have it print out some python code which when run on the host creates a binary file with the opcodes for the inline assembler. Add the following inspect function to the test code:

Code: Select all

def inspect(f, nbytes=16):
    import stm
    import array
    import ubinascii
    @micropython.asm_thumb
    def dummy():
        pass
    if type(f) != type(dummy):
        raise ValueError('expecting an inline-assembler function')
    baddr = bytes(array.array('O', [f]))
    addr = baddr[0] | baddr[1] << 8 | baddr[2] << 16 | baddr[3] << 24
    print('function object at: 0x%08x' % addr)
    print('number of args: %u' % stm.mem32[addr + 4])
    code_addr = stm.mem32[addr + 8]
    print('machine code at: 0x%08x' % code_addr)
    print('----------')
    print('import binascii')
    print("with open('code.bin', 'wb') as f:")
    import ubinascii
    hex_str = ubinascii.hexlify(bytearray([stm.mem8[code_addr + i] for i in range(nbytes)]))
    print("    f.write(binascii.unhexlify(%s))" % hex_str)
    print('----------')
and add the following to the beginning of the test function:

Code: Select all

    inspect(_write_packet, 64)
You can get a resaonable guess as to the size of your inline assembler by counting the number of opcodes (not including labels) and multiplying by 2. I then round up to the next multiple of 16 (I counted 28 opcodes = 28 * 2 = 56, which came out to 64 when I rounded up).

Now I get the following report when the hardfault occurs (the backslash was added manually by me to try and keep the width of this post reasonable):

Code: Select all

function object at: 0x20003500
number of args: 3
machine code at: 0x200068f0
----------
import binascii
with open('code.bin', 'wb') as f:
    f.write(binascii.unhexlify(b'f2b5c3680424a343c3605218013a91420ad8802403682342fcd0914200d172b6\
0b7801314360f2e7402403682342f1d0c36804242343c36062b6f2bd00000000'))
----------
R0    c0011400
R1    20003cc0
R2    00000006
R3    00000001
R12   0802efad
LR    0802f01b
PC    200068f2
XPSR  21000000
HFSR  40000000
CFSR  00008200
BFAR  c001140c

FATAL ERROR:
HardFault
If you copy the lines between the dashes into a new python file on your host machine and execute it, then it will produce a file named code.bin.

I then ran the ARM disassembler on that file using the command:

Code: Select all

arm-none-eabi-objdump -bbinary -marm --disassemble-all code.bin -Mforce-thumb
and it will generate this output:

Code: Select all

code.bin:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:	b5f2      	push	{r1, r4, r5, r6, r7, lr}
   2:	68c3      	ldr	r3, [r0, #12]
   4:	2404      	movs	r4, #4
   6:	43a3      	bics	r3, r4
   8:	60c3      	str	r3, [r0, #12]
   a:	1852      	adds	r2, r2, r1
   c:	3a01      	subs	r2, #1
   e:	4291      	cmp	r1, r2
  10:	d80a      	bhi.n	0x28
  12:	2480      	movs	r4, #128	; 0x80
  14:	6803      	ldr	r3, [r0, #0]
  16:	4223      	tst	r3, r4
  18:	d0fc      	beq.n	0x14
  1a:	4291      	cmp	r1, r2
  1c:	d100      	bne.n	0x20
  1e:	b672      	cpsid	i
  20:	780b      	ldrb	r3, [r1, #0]
  22:	3101      	adds	r1, #1
  24:	6043      	str	r3, [r0, #4]
  26:	e7f2      	b.n	0xe
  28:	2440      	movs	r4, #64	; 0x40
  2a:	6803      	ldr	r3, [r0, #0]
  2c:	4223      	tst	r3, r4
  2e:	d0f1      	beq.n	0x14
  30:	68c3      	ldr	r3, [r0, #12]
  32:	2404      	movs	r4, #4
  34:	4323      	orrs	r3, r4
  36:	60c3      	str	r3, [r0, #12]
  38:	b662      	cpsie	i
  3a:	bdf2      	pop	{r1, r4, r5, r6, r7, pc}
  3c:	0000      	movs	r0, r0
	...
which correlates exactly with the inline assembler we wrote. MicroPython added a push instruction at the beginning and a pop instruction at the end, but the rest is from the inline assembler.

If we take the PC whre the hard fault occurred: 200068f2 and subtract the value printed on the 'machine code at: 0x200068f0' line (immediately before the first line of dashes) we get an offset of 2. Looking at the disassembly listing this means it crashed on the very first line of the function:

Code: Select all

ldr	r3, [r0, #12]
which reads from the memory r0 + 12 and stores it into r3. Fortunately, the hard fault dump includes registers R0 to R3, and we see that R0 has the value: c0011400

The hardfault was caused by trying to access memory location c001140c, and this is also confirmed by the contents of the BFAR (Bus Fault Address Register)
r0 is the first argument to the _write_packet function, which is supposed to be the address of the USART6 peripheral, which is 40011400.

Ah-ha. MicroPython passed the value as a small int which means that bits 30 and 31 always the same, so we need to modify our routine to mask away that first bit.

Note: MicroPython has since been fixed so that it can now pass in full 32-bit integers to inline assembler functions, and these extra lines of masking are no longer required. At the time I added these lines to the beginning of my inline assembler function:

Code: Select all

    movw(r3, 0xffff)                # uart(r0) &= 0x7fffffff
    movt(r3, 0x7fff)                #
    and_(r0, r3)                    #
and reran it. This caused another hard fault to occur:

Code: Select all

function object at: 0x20003500
number of args: 3
machine code at: 0x20006940
----------
import binascii
with open('code.bin', 'wb') as f:
    f.write(binascii.unhexlify(b'f2b54ff6ff73c7f6ff731840c3680424a343c3605218013a91420ad8802403682342fcd0914200d172b6\
0b7801314360f2e7402403682342f1d0c36804242343'))
----------
R0    40011400
R1    20020000
R2    20003ac5
R3    000000c0
R12   0802efad
LR    0802f01b
PC    2000696a
XPSR  21000000
HFSR  40000000
CFSR  00008200
BFAR  20020000

FATAL ERROR:
HardFault
Now we have a PC of 2000696a and machine code address of 0x20006940 yielding an offset of 0x2a:

Code: Select all

  26:	d100      	bne.n	0x2a
  28:	b672      	cpsid	i
  2a:	780b      	ldrb	r3, [r1, #0]
  2c:	3101      	adds	r1, #1
  2e:	6043      	str	r3, [r0, #4]
This time the BFAR shows 20020000 which is 1 byte beyond the end of RAM, and R1 also contains 20020000. Somehow the code didn't stop at the right place and kept right zipping through memory until it tried to access beyond the end of RAM.

After looking at the assembler some more, I noticed that I had a copy/paste error at the end of my wait_tx_complete loop. It had a branch to wait_txe rather than wait_tx_complete. I fixed that and the routine now works properly.

The _write_packet routine is part of some code I've been working on for controlling bioloid servos being used to control my six legged walker (or my brother's 4 legged walker named Roz: http://forum.micropython.org/viewtopic. ... &hilit=Roz )

The reason I needed to write the inline assembler at all was because when running on the Espruino Pico (which only runs at 84 MHz) the
time delay between the uart.write and the re-enabling of the receiver was taking too long and I was missing the response packet from the servo. Using the inline assembler routine I never miss a response packet.

User avatar
pythoncoder
Posts: 5956
Joined: Fri Jul 18, 2014 8:01 am
Location: UK
Contact:

Re: Debugging Hard Faults in Inline Assembler (Long)

Post by pythoncoder » Sun Jan 10, 2016 8:01 am

Very nice. That would have saved me a lot of time ;)
Peter Hinch
Index to my micropython libraries.

Post Reply