How to test sign of floating-point register in Micropython assembly language

1

I'm learning assembler for MicroPython (ARM Thumb2 instruction set for PyBoard).

Is there a quicker way to check the sign (positive/negative) of an FPU register (s0) than this?

@micropython.asm_thumb
def float_array_abs(r0, r1):
    label(LOOP)
    vldr(s0, [r0, 0])
    vmov(r2, s0)         # 1
    cmp(r2, 0)           # 2
    itt(mi)              # 3
    vneg(s0, s0)
    vstr(s0, [r0, 0])
    add(r0, 4)
    sub(r1, 1)
    bgt(LOOP)

This works but it doesn't seem like the 'right' solution (not sure the sign of r2 always matches the sign of s0) and I suspect it must be possible in less than two instructions.

UPDATE 1:

Based on the comments (thanks) I have improved the speed of the code further:

@micropython.asm_thumb
def float_array_abs1(r0, r1):
    label(LOOP)
    ldr(r2, [r0, 0])
    cmp(r2, 0)         # this works for some reason
    bge(SKIP)
    vmov(s0, r2)
    vneg(s0, s0)
    vstr(s0, [r0, 0])  # this can be skipped if not negative
    label(SKIP)
    add(r0, 4)
    sub(r1, 1)
    bgt(LOOP)

But it still leaves the question, is this a robust way of determining sign of an FP value?

For reference here are the byte representations of four float values on my system:

-1.0 0xbf800000
-0.0 0x80000000
 0.0 0x00000000
 1.0 0x3f800000

I guess if this is hardware dependent then I shouldn't be relying on this to determine the sign...

I think this might be the 'proper' way to do it (i.e. proper FPU comparison):

def float_array_abs2(r0, r1):
    mov(r2, 0)
    vmov(s1, r2)
    label(LOOP)
    vldr(s0, [r0, 0])
    vcmp(s0, s1)
    vmrs(APSR_nzcv, FPSCR)
    itt(mi)
    vneg(s0, s0)
    vstr(s0, [r0, 0])
    add(r0, 4)
    sub(r1, 1)
    bgt(LOOP)

But I timed this and it is 11% slower than the code above (float_array_abs1). So it would be nice to use the earlier code if it is a reliable solution.

UPDATE 2:

@Ped7g proposed the method and 0x7FFFFFFF (see comments).

I tested this and it does work. Here is the code:

@micropython.asm_thumb
def float_array_abs3(r0, r1):
    movwt(r3, 0x7FFFFFFF)
    label(LOOP)
    ldr(r2, [r0, 0])
    and_(r2, r3)
    str(r2, [r0, 0])
    add(r0, 4)
    sub(r1, 1)
    bgt(LOOP)

CORRECTION: It is faster than float_array_abs1 above. This appears to be the best solution but is it robust?

assembly
floating-point
arm
micropython
asked on Stack Overflow Apr 1, 2018 by Bill • edited Apr 2, 2018 by Bill

1 Answer

1

Masking the sign bit to 0 with an and is safe and optimal for IEEE 754 binary floating-point formats like float and double.

It will convert -Inf to +Inf as desired. It will convert -NaN into +NaN, but it's still a NaN.

NaN is indicated by all-ones exponent and non-zero significand. Inf is all-ones exponent with zero significand. (https://en.wikipedia.org/wiki/Single-precision_floating-point_format)

Most code doesn't care about the payload or sign of a NaN, just that it is NaN, so clearing the sign bit is fine.


ARM can do this with integer SIMD NEON instructions for 4 single-precision floats at a time. I don't know about if VFP (non-NEON hardware FPU) supports an AND instruction.

Related: Fastest way to compute absolute value using SSE AND is the best way on x86 as well.


BTW, doing this in a separate loop is probably a waste of memory bandwidth. Doing the absolute value on the fly in loops that read the array is probably best, unless you read this array many times after writing it once. At least if you can do the AND in an FP register. Loading into an integer register for AND and then moving from integer to FP for math instructions would be bad.

Usually you want more computational intensity in your loops (do more ALU work for each load from memory).

answered on Stack Overflow Apr 5, 2018 by Peter Cordes

User contributions licensed under CC BY-SA 3.0