A couple of questions re maths

pythoncoder · Post by **pythoncoder** » Wed Mar 25, 2015 8:20 am

Firstly, is it feasible to provide access to the four arithmetic operations in the FPU in assembler? I've been unable to locate any documentation about how the FPU is accessed in code. If this could be achieved it would presumably provide a means for interrupt handlers to process floats.

Secondly, the file stmhal/arm_math.h suggests that there is support for a range of useful maths functions including IIR and FIR filtering, PID controllers, matrix and vector operations - many of application in embedded projects. Is there any scope for accessing at least a subset of these from a MicroPython library, or would it bust the vital "Micro" attribute to do so?

dhylands · Post by **dhylands** » Wed Mar 25, 2015 6:24 pm

When I'm not sure how to code something in assembler, I normally let the compiler do it for me. For example, this C file:

Code: Select all

float foo(float a, float b) {
    return a + b;
}

when compiled using these flags:

Code: Select all

arm-none-eabi-gcc -c -mthumb -mtune=cortex-m4 -mabi=aapcs-linux -mcpu=cortex-m4 -mfpu=fpv4-sp-d16 -mfloat-abi=hard -gstabs -Wa,-ahldms=float.cod float.c

will produce this assembler:

Code: Select all

   1:float.c       **** float foo(float a, float b) {
  78              	.LM0:
  79              	.LFBB1:
  80              		@ args = 0, pretend = 0, frame = 8
  81              		@ frame_needed = 1, uses_anonymous_args = 0
  82              		@ link register save eliminated.
  83 0000 80B4     		push	{r7}
  84 0002 83B0     		sub	sp, sp, #12
  85 0004 00AF     		add	r7, sp, #0
  86 0006 87ED010A 		fsts	s0, [r7, #4]
  87 000a C7ED000A 		fsts	s1, [r7]
   2:float.c       ****     return a + b;
  89              	.LM1:
  90 000e 97ED017A 		flds	s14, [r7, #4]
  91 0012 D7ED007A 		flds	s15, [r7]
  92 0016 77EE277A 		fadds	s15, s14, s15
   3:float.c       **** }
  94              	.LM2:
  95 001a B0EE670A 		fcpys	s0, s15
  96 001e 0C37     		adds	r7, r7, #12
  97 0020 BD46     		mov	sp, r7
  98              		@ sp needed
  99 0022 5DF8047B 		ldr	r7, [sp], #4
 100 0026 7047     		bx	lr

Post by **Damien** » Wed Mar 25, 2015 9:53 pm

pythoncoder wrote:Firstly, is it feasible to provide access to the four arithmetic operations in the FPU in assembler?

I think there're more than 4 instructions needed, because you also need mov/load/store instructions. Implementing all this in the inline assembler is entirely feasible, but not on my agenda. Feel free to submit a patch for it though!

Secondly, the file stmhal/arm_math.h suggests that there is support for a range of useful maths functions including IIR and FIR filtering, PID controllers, matrix and vector operations - many of application in embedded projects. Is there any scope for accessing at least a subset of these from a MicroPython library, or would it bust the vital "Micro" attribute to do so?

Sounds like a perfect job for "loadable native modules" https://github.com/micropython/micropython/issues/583

pythoncoder · Post by **pythoncoder** » Thu Mar 26, 2015 6:33 am

@dhylands That's a nice approach. Looking at the assembler code it does look as if a bit of magic is involved in getting the float arguments into the routine: as far as I can see they are already in S0 and S1 rather than being on the stack. To satisfy my curiosity I'll try your technique to investigate how that occurs but I fear that applying the knowledge or submitting patches is beyond my competence. My C is rusty and to date I've failed to follow the MicroPython assembler source beyond a certain point.

@Damien
I appreciate that there are higher priorities than further enhancing the assembler and the current subset is very capable. Discovering all that C maths code did pique my curiosity, though

pythoncoder · Post by **pythoncoder** » Wed Apr 01, 2015 7:14 am

To date I haven't got past first base with the FPU. Perhaps someone could point out where I'm going wrong. I started out with this integer code (which works) with the intention of adapting it for floats

Code: Select all

import array

a = array.array("i", [3, 4, 0])
@micropython.asm_thumb
def mult(r0):
    mov(r3, r0)             # r3 = address of array[0]
    ldr(r0, [r3, 0])
    add(r3, 4)              # address of array[1]
    ldr(r1, [r3, 0])
    add(r3, 4)              # address of array[2]
    mul(r0, r1)
    str(r0, [r3, 0])

mult(a)
print(a)

I then compiled the following code using the flags as @dhylands suggested

Code: Select all

void foo(float *a, float *b, float *result) {
    *result = *a * *b ;
}

this yielded the following assembler (edited to show only the interesting bit)

Code: Select all

  84 0000 80B4     		push	{r7}
  85 0002 85B0     		sub	sp, sp, #20
  86 0004 00AF     		add	r7, sp, #0
  87 0006 F860     		str	r0, [r7, #12]
  88 0008 B960     		str	r1, [r7, #8]
  89 000a 7A60     		str	r2, [r7, #4]
   2:float.c       ****     *result = *a * *b ;
  91              	.LM1:
  92 000c FB68     		ldr	r3, [r7, #12]
  93 000e 93ED007A 		flds	s14, [r3]
  94 0012 BB68     		ldr	r3, [r7, #8]
  95 0014 D3ED007A 		flds	s15, [r3]
  96 0018 67EE277A 		fmuls	s15, s14, s15
  97 001c 7B68     		ldr	r3, [r7, #4]
  98 001e C3ED007A 		fsts	s15, [r3]
   3:float.c       **** }
 100              	.LM2:
 101 0022 1437     		adds	r7, r7, #20
 102 0024 BD46     		mov	sp, r7
 103              		@ sp needed
 104 0026 5DF8047B 		ldr	r7, [sp], #4
 105 002a 7047     		bx	lr

It seemed clear what was going on here so I used the resultant opcodes to modify my integer code for FP. Unfortunately the outcome is a crash.

Code: Select all

import array

a = array.array("f", [3.0, 4.0, 0.0])
# Use same register R3 as float.cod
@micropython.asm_thumb
def mult(r0):
    mov(r3, r0)             # r3 = address of array[0]
    data(2, 0x93ED, 0x007A) # flds	s14, [r3]
    add(r3, 4)              # address of array[1]
    data(2, 0xD3ED, 0x007A, 0x67EE, 0x277A) #flds	s15, [r3], fmuls s15, s14, s15
    add(r3, 4)              # address of array[2]
    data(2, 0xC3ED, 0x007A) # fsts	s15, [r3]

print(a[0] + a[1]) # Force a FP op to ensure FPU is enabled (?)
mult(a)
print(a)

The code crashes on execution of any of the floating point instructions. I fear I'm doing something very silly indeed but I can't see it

manitou · Post by **manitou** » Wed Apr 01, 2015 4:16 pm

no real help, but i modified your code to just load s15 and store it in array[1],

Code: Select all

import array

a = array.array("f", [3.0, 4.0, 0.0])
# Use same register R3 as float.cod
@micropython.asm_thumb
def mult(r0):
    mov(r3, r0)             # r3 = address of array[0]
    data(2, 0xD3ED, 0x007A) # flds   s15, [r3]
    add(r3, 4)              # address of array[2]
    data(2, 0xC3ED, 0x007A) # fsts   s15, [r3]

print(a[0] + a[1]) # Force a FP op to ensure FPU is enabled (?)
mult(a)
print(a)

it doesn't crash, but what it prints is not comforting

7.0
array('f', [3.0, 1.085875e-19, 0.0])

Most alterations to the assembler code results in a crash ...

pythoncoder · Post by **pythoncoder** » Thu Apr 02, 2015 8:24 am

Interesting: so it's updating the correct array element, but with garbage. There's clearly something we don't understand about the FPU. I notice that the assembler generated from C creates a 20 byte stack frame. It receives the pointer arguments in r0-r2 and shifts them into the stack frame before using them. I guess this is just compiler bahaviour: creating local variables automatically. But why does it reserve 20 bytes of stack for 12 bytes of data? Does the FPU expect to be provided with a scratchpad on the stack? I'll experiment by creating a stack frame at the outset.

<later> No joy with a stack frame.

pythoncoder · Post by **pythoncoder** » Fri Apr 03, 2015 8:49 am

In case anyone else is interested in pursuing this, I've figured out the problem. The assembler listing shows the bytes making up a 16 bit word in reverse order. Consider the line

Code: Select all

0000 80B4     		push	{r7}

In the ARM manual the T1 opcode is listed B480 and the correct data statement to cause that instruction to be executed is

Code: Select all

data(2, 0xB480)

The following code example works.

Code: Select all

import array

a = array.array("f", [3.0, 4.0, 0.0])
# Use same register R3 as float.cod
@micropython.asm_thumb
def mult(r0):
    mov(r3, r0)             # r3 = address of array[0]
    data(2, 0xED93, 0x7A00) # flds	s14, [r3]
    add(r3, 4)              # address of array[1]
    data(2, 0xEDD3, 0x7A00) # flds	s15, [r3]
    data(2, 0xEE67, 0x7A27) # fmuls s15, s14, s15
    add(r3, 4)              # address of array[2]
    data(2, 0xEDC3, 0x7A00) # fsts	s15, [r3]

mult(a)
print(a)

A final request. So far I've been unable to locate a document describing the FPU's instruction set down to the level of actual opcode binaries. I'd greatly appreciate a pointer

dhylands · Post by **dhylands** » Fri Apr 03, 2015 5:32 pm

I think that the document you want is titled: "ARM v7-M Architecture Reference Manual". It's a 916 page PDF.

I think you need to get an account on arm.com and then you can download it.

The URL I have is: https://silver.arm.com/download/ARM_and ... 7m_arm.pdf

It has each of the FPU opcode listed and broken down by bitfields as to what the bits in the opcode mean and how they map to registers, etc.

pythoncoder · Post by **pythoncoder** » Sat Apr 04, 2015 6:53 am

Thanks for that. Evidently there is more than one version of that document: the one I have, with exactly that title, is 705 pages and lacks the FPU info. Now downloaded.

MicroPython Forum (Archive)

A couple of questions re maths

A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths

Re: A couple of questions re maths