Inline Assembler - Am I doing it right?

RP2040 based microcontroller boards running MicroPython.
Target audience: MicroPython users with an RP2040 boards.
This does not include conventional Linux-based Raspberry Pi boards.
Post Reply
samneggs
Posts: 20
Joined: Tue Jun 15, 2021 12:57 am

Inline Assembler - Am I doing it right?

Post by samneggs » Wed Nov 24, 2021 1:42 am

I naively converted a viper routine to inline assembly expecting some huge performance improvement. It ran 30% faster which is typical of the other routines I converted - so the first question is why 30%? If I'm doing the same thing as the viper emitter maybe it should be the same. I doubt I'm a better assembler than the viper emitter. Maybe there's some overhead in MicroPython that is not executed during my assembly routine?

Second - How to do variables in assembly? With short routines I use the r0-r7 with some not changing and others as scratchpads. With longer code I used the stack but it tricky to keep my pop aligned with my pushes. At one point I tried using one r-register for all the booleans and then bit masked whichever one I needed every time. Then I settled on loading and saving from an array that I fed the routine where even each boolean uses a 16 bit word. Here is the viper code followed by my conversion:

Code: Select all

@micropython.viper
def fill_triangle(obj_source, width_source:int,color_offset:int, y_start:int):
    source=ptr16(obj_source)
    c=ptr16(color_l2)
    sine=ptr16(isin)
    y=y_start-2 
    start_inside = 0
    #y_tot = y - width_source//2
    #color_offset-=10
    while y:  
        x=width_source        
        inside=0
        erase = 0
        old_color=0
        color = 0
        while x:
            i=y*width_source+x            
            if erase:
                source[i]=c[(y+((color_offset*x)>>8))>>2]      # erase color with background
                x-=1
                continue            
            old_color=color
            color=source[i]
            if old_color and not color:                        # transistion off of line
                if not inside:
                    start_inside = x                    
                    source[i]=c[(y+((color_offset*x)>>8))>>2]  # fill outside
                inside^=1
            elif inside:
                source[i]=c[((start_inside-x)>>2)+32]#//6+32]            # fill inside
                if x==1:
                    inside = 0       
                    x=start_inside                             # go back to last edge
                    erase =1                                   # erase to end of row
            else:
                source[i]= c[(y+((color_offset*x)>>8))>>2]
            x-=1  
        y-=1



and assembly:

Code: Select all

SCREEN_CHUNK_H  =const(240)
CTL_HEIGHT      =const(0)
CTL_COLOR_OFFSET=const(2)
CTL_OLD_COLOR   =const(4)
CTL_NEW_COLOR   =const(6)
CTL_START_INSIDE=const(8)
CTL_INSIDE      =const(10)
CTL_ERASE       =const(12)

# control CTL_HEIGHT,CTL_COLOR_OFFSET,CTL_OLD_COLOR, CTL_NEW_COLOR, CTL_START_INSIDE, CTL_INSIDE, CTL_ERASE 
control=array.array('h',(SCREEN_CHUNK_H,0,0,0,0))
@micropython.asm_thumb        
def color_asm(r0,r1,r2):         # r0=screen address,r1= color_l2 address, r2 = control
    ldrh(r3, [r2, CTL_HEIGHT])    # r3 = working height from control
    label(HLOOP)                  # height loop
    mov(r4,0)                     # 
    strh(r4, [r2, CTL_INSIDE])    # inside=0
    strh(r4, [r2, CTL_OLD_COLOR]) # old_color=0   
    strh(r4, [r2, CTL_NEW_COLOR]) # new_color=0
    strh(r4, [r2, CTL_ERASE])     # erase=0
    ldrh(r6, [r2, CTL_HEIGHT])    # r6 = working width from control (same as height)
    label(WLOOP)                  # width loop  

    ldrh(r5, [r2, CTL_ERASE])     # recolor background
    cmp(r5,0)
    beq(NOT_ERASE)
    bl(BACKCOLOR)
    b(NEXT)        
    label(NOT_ERASE)
    
    ldrh(r5, [r2, CTL_NEW_COLOR]) # r5 =  new_color
    strh(r5, [r2, CTL_OLD_COLOR]) # store in old_color
    
    bl(SCREEN_ADDR)               # r4= screen addr
    ldrh(r5, [r4, 0])             # get color
    strh(r5, [r2, CTL_NEW_COLOR]) # store in new_color
    
    cmp(r5,0)    
    beq(CHECK_INSIDE)                # skip if =0
    
    
    ldrh(r5, [r2, CTL_OLD_COLOR])    # r5 =  new_color
    cmp(r5,0)
    bgt(CHECK_INSIDE)                # skip if >0
       
    ldrh(r4, [r2, CTL_INSIDE])       # load inside    
    cmp(r4,1)
    beq(INVERT_INSIDE) 
    strh(r6, [r2, CTL_START_INSIDE]) # start_inside = x
    bl(BACKCOLOR)
    
    label(INVERT_INSIDE)             # inside^=1
    ldrh(r4, [r2, CTL_INSIDE])
    mov(r5,1)
    eor(r4,r5)
    strh(r4, [r2, CTL_INSIDE])
    b(NEXT)
    
    label(CHECK_INSIDE)
    ldrh(r4, [r2, CTL_INSIDE])       # inside==1?
    cmp(r4,0)
    beq(OUTSIDE)
    label(INSIDE_COLOR)
    cmp(r6,1)                        
    bne(SKIP_ERASE)
    mov(r4,0)
    strh(r4, [r2, CTL_INSIDE])       # inside = 0
    ldrh(r6, [r2, CTL_START_INSIDE]) # x=start_inside
    mov(r4,1)
    strh(r6, [r2, CTL_ERASE])        # erase = 1        
    
    label(SKIP_ERASE)    
    ldrh(r5, [r2, CTL_START_INSIDE]) # start_inside
    sub(r5,r5,r6)                    # start_inside-x
    asr(r5,r5,1)                     # (start_inside-x)>>1
    mov(r4,32)#32
    add(r5,r5,r4)
    mov(r4, 0x1)                     # 
    bic(r5,r4)                       # align to even bit     
    bl(SCREEN_ADDR)                  # r4= screen addr
    add(r5,r5,r1)                    # color_l2[r5]
    ldrh(r5, [r5, 0])
    strh(r5, [r4, 0])                # screen[r0]=r4
    b(NEXT)

    label(OUTSIDE)
    bl(BACKCOLOR)
    
    label(NEXT)    
    sub(r6,1)                  # dec working width 
    bgt(WLOOP)
    sub(r3,1)                  # dec working height
    bgt(HLOOP)
    b(EXIT)
    
    label(BACKCOLOR)           # uses r4,r5,r6   calcs and writes pixel    
    ldrh(r5, [r2, CTL_COLOR_OFFSET])  # r5 = color_offset from control    
    mul(r5,r6)                 # color_offset*x
    asr(r5,r5,8)               # (color_offset*x)>>8)
    add(r5,r5,r3)              # (y+((color_offset*x)>>8)
    asr(r5,r5,1)               # (y+((color_offset*x)>>8))>>2
    mov(r4, 0x1)               # 
    bic(r5,r4)                 # align to even bit 
    add(r5,r5,r1)              # color_l2 + (y+((color_offset*x)>>8))>>2    
    ldrh(r5, [r5, 0])          # r5 =  color[r5]
    
    label(SCREEN)              # writes r5 to screen address
    ldrh(r4, [r2, CTL_HEIGHT]) # height
    mul(r4,r3)                 # y*height    
    add(r4,r4,r6)              # (y*height)+x
    add(r4,r4,r4)              # double for 2 bytes
    add(r4,r4,r0)              # add to screen address    
    strh(r5, [r4, 0])          # screen[r0]=r4
    bx(lr)

    label(SCREEN_ADDR)         # returns screen address in r4
    ldrh(r4, [r2, CTL_HEIGHT]) # height
    mul(r4,r3)                 # y*height    
    add(r4,r4,r6)              # (y*height)+x
    add(r4,r4,r4)              # double for 2 bytes
    add(r4,r4,r0)              # add to screen address     
    bx(lr)

    label(EXIT)
Is this typical? Is it ridiculous? Are there other ways to make variables in assembly?

Sam

samneggs
Posts: 20
Joined: Tue Jun 15, 2021 12:57 am

Re: Inline Assembler - Am I doing it right?

Post by samneggs » Wed Nov 24, 2021 3:17 am

Here is another way of using the data() directive and program counter(pc) to embed a variable.
Maybe too much trouble for its worth.

Code: Select all

@micropython.asm_thumb        
def test_asm():
    mov(r0,pc)         #address of next statement will go in r0
    b(SKIP)
    
    data(2,98,0)   # put initial variable data here
    align(2)
    
    label(SKIP)
    ldrh(r1, [r0, 0])  # read variable
    add(r1,r1,1)       # add 1 
    strh(r1, [r0, 0])  # write variable
    ldrh(r0, [r0, 0])  # read variable again

    
print(test_asm()) # prints 99 

Post Reply