32-bit / 16-bit signed integer division without 32-bit registers?

Question

32-bit / 16-bit signed integer division without 32-bit registers?

I'm trying to divide a 32-bit signed integer by a 16-bit signed integer to get a signed 32-bit quotient and 16-bit remainder.

I'm targeting a 286 with no fpu.

I've already written code in the past to do 32-bit unsigned division:

DIV32 PROC

;DIVIDES A 32-BIT VALUE BY A 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE 16-BIT DIVISOR IN BX

;RETURNS THE 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

    push di
    push si


    mov di, ax ;di -> copy of LSW of given dividend
    mov ax, dx ;ax -> MSW of given dividend
    xor dx, dx ;dx:ax -> 0:MSW  
    div bx     ;ax:dx -> ax=MSW of final quotient, dx=remainder

    mov si, ax ;si -> MSW of final quotient
    mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
    div bx     ;ax:dx -> ax=LSW of final quotient, dx=final remainder  
    mov bx, dx ;bx -> final remainder
    mov dx, si ;dx:ax -> final quotient


    pop si
    pop di
    ret

DIV32 ENDP

So far, i've tried to do the obvious thing and just modify my existing code by swapping out the XOR DX, DX with CWD and DIV with IDIV:

IDIV32 PROC

;DIVIDES A SIGNED 32-BIT VALUE BY A SIGNED 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE SIGNED 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE SIGNED 16-BIT DIVISOR IN BX

;RETURNS THE SIGNED 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

    push di
    push si


    mov di, ax ;di -> copy of LSW of given dividend
    mov ax, dx ;ax -> MSW of given dividend
    cwd        ;dx:ax -> 0:MSW, or ffff:MSW  
    idiv bx    ;ax:dx -> ax=MSW of final quotient, dx=remainder

    mov si, ax ;si -> MSW of final quotient
    mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
    idiv bx    ;ax:dx -> ax=LSW of final quotient, dx=final remainder  
    mov bx, dx ;bx -> final remainder
    mov dx, si ;dx:ax -> final quotient


    pop si
    pop di
    ret

IDIV32 ENDP

This works for some calculations, like -654,328/2=-327164 (0xfff60408/2=fffb0204). But it doesn't work with certain inputs, -131,076/2 returns the incorrect result of -2 remainder 0. A divisor of 1, or -1 causes a division error seemingly regardless of dividend.

I've tested many different positive and negative dividends and divisors in an attempt to find some kind of pattern of correct and incorrect results, i've noticed that it can't correctly return a result of -65538.

I have a hunch that i'm supposed to use CWD conditionally depending on input, but it seems like XOR DX, DX returns incorrect results more often. Either work when both divisor and dividend are positive and the quotient is less than 0x7fffffff.

assembly

x86-16

integer-division

asked on Stack Overflow Mar 30, 2019 by

My life is a bug. • edited Mar 30, 2019 by

Peter Cordes

4 Answers

I don't know any algorithm for splitting a large negative number into parts and preparing it for IDIV. I would calculate the absolute value of dividend and divisor, use the function DIV32 and lastly process the result according to the stored sign:

IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
    ; 99 / 5   =  19 rem 4
    ; 99 / -5  = -19 rem 4
    ; -99 / 5  = -19 rem -4
    ; -99 / -5 =  19 rem -4

    mov ch, dh          ; Only the sign bit counts!
    shr ch, 7           ; CH=1 means negative dividend
    mov cl, bh          ; Only the sign bit counts!
    shr cl, 7           ; CL=1 means negative divisor

    cmp ch, 1           ; DX:AX negative?
    jne J1              ; No: Skip the next two lines
    not dx              ; Yes: Negate DX:AX
    neg ax              ; CY=0 -> AX was NULL
    cmc
    adc dx, 0           ; Adjust DX, if AX was NULL
    J1:

    cmp cl, 1           ; BX negative?
    jne J2              ; No: Skip the next line
    neg bx              ; Yes: Negate BX
    J2:

    push cx             ; Preserve CX
    call DIV32
    pop cx              ; Restore CX

    cmp ch, cl          ; Had dividend and divisor the same sign?
    je J3               ; Yes: Skip the next two lines
    not dx
    neg ax              ; CY=0 -> AX was NULL
    cmc
    adc dx, 0           ; Adjust DX, if AX was NULL
    J3:

    cmp ch, 1           ; Was divisor negative?
    jnz J4              ; No: Skip the next line
    neg bx              ; Negate remainder
    J4:

    ret
IDIV32 ENDP

answered on Stack Overflow Mar 30, 2019 by

rkhb • edited Mar 31, 2019 by

rkhb

Your algorithm cannot be changed to signed that simply.

Let's take the calculation (+1)/(-1) as an example:

(+1)/(-1) = (-1), remainder 0

In the first step of your algorithm you are dividing the high bits by the divisor:

The high bits of (+1) are 0, so you are calculating:

0/(-1) = 0, remainder 0

The correct high bits of the entire 32-bit division are however 0FFFFh, not 0. And the reminder you require for the second division would also be 0FFFFh and not 0.

Oh, so the second IDIV should be a DIV. Ok, i'll test it when I wake up tomorrow. And i'll add an answer if I get it working.

The first division already does not produce the result you want. So changing the second division won't help much...

A divisor of 1, or -1 causes a division error seemingly regardless of dividend.

I would expect this only if bit 15 of the dividend is set and:

... you divide by 1 or
... you divide by -1 and at least one of the low 15 bits of the dividend is set

In these cases you are dividing:

... a number in the range 000008000h...00000FFFFh by 1
The result would be in the range +08000h...+0FFFFh
... a number in the range 000008001h...00000FFFFh by -1
The result would be in the range -0FFFFh...-08001h

... however, the result is a signed 16-bit value and therefore must be in the range -8000h...+7FFFh.

I just tried 12345678h/(+1) and 12345678h/(-1) in a virtual machine running DOS:

Bit 15 of 12345678h is not set; both times I don't get a division error. (But a wrong result when dividing by -1!)

answered on Stack Overflow Mar 30, 2019 by

Martin Rosenau

Using 2x idiv has a fundamental problem: we need the 2nd division to produce the low half of the quotient, which is unsigned and can be anything from 0 to 0xffff.

Only the highest word of a multi-word integer contains the sign bit, all bits below that have positive place-value. idiv's quotient range is -2^15 .. 2^15-1, not 0 .. 65535. Yes, idiv can produce all required values, but not from inputs we can get from simple fixups of the first division results. For example, 0:ffff / 1 will result in a #DE exception with idiv because the quotient doesn't fit in a signed 16-bit integer.

So the second division instruction has to be div, using the absolute value of the divisor, and an appropriate high half. (div requires both its inputs to be unsigned, so a signed remainder from a first idiv would be a problem, too.)

It might still be possible to use idiv for the first division, but only with fixups for its results before div, on top of still having to take the absolute value of the divisor and first-remainder to feed unsigned div. It's an interesting possibility, but in practice it's going to be cheaper to save and reapply signs around an unsigned division.

As @Martin points out, the first division of +1 / -1 with naive idiv gives the wrong high half quotient (0 / -1 = 0 not -1), and the wrong input for the 2nd division (0 % -1 = 0, not -1). TODO: figure out what fixups would actually be needed. Maybe just a conditional +-1, but we know the magnitude of the remainder can't get larger than the divisor, because high_half < divisor is necessary and sufficient for div to not fault.

Your -131,076/2 = -2 is (perhaps by coincidence) only off by 1 in one half of its result:
it should be 0xfffefffe = -2:-2 not -1:-2.

Optimized version of @rkhb's function, with DIV32 inlined.

We record input signs and then do unsigned division on the absolute values, and restore sign later. (If remainder sign isn't needed, we can simplify; the quotient sign only depends on xor dividend,divisor)

Or if the dividend is small enough, we can use one idiv. We have to avoid the -2^15 / -1 overflow case, though, so the fast check for DX being the sign extension of AX not only misses some safe cases (with larger divisors), but tries that unsafe case. If small numbers are common (like is the case in most computer programs), this fast test based on cwd is still maybe a good idea, with another test after the absolute-value calculation.

Branches are cheap-ish on 286, so I mostly kept the branching instead of using branchless abs(). (e.g. for a single register: with cdq (or sar reg,15) / xor / sub, like compilers make, based on the 2's complement identity that -x = ~x + 1). And mov/neg/cmovl of course isn't available until P6-family. If you need compat with 286 but mostly care about performance on modern CPUs, you might make different choices. But it turns out that a 32-bit branchless ABS is smaller code size than branching. However, it's probably slower than branchy for positive inputs, where some instructions would have been skipped. Assembler 8086 divide 32 bit number in 16 bit number has an interesting idea of creating 0/-1 integers for both dividend and divisor, and then for later re-applying the signs you can XOR those together and use the same XOR/SUB 2's complement bithack to sign-flip the results or not.

Style: local labels (within the function) prefixed with @@. I think this is normal for TASM, like NASM .label local labels.

   ; signed 32/16 => 32-bit division, using 32/16 => 16-bit division as the building block
   ; clobbers: CX, SI
IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
;global IDIV32_16       ; for testing with NASM under Linux
;IDIV32_16:
    ; 99 / 5   =  19 rem 4
    ; 99 / -5  = -19 rem 4
    ; -99 / 5  = -19 rem -4
    ; -99 / -5 =  19 rem -4

    mov   cx, dx          ; save high half before destroying it

;;; Check for simple case
    cwd                   ; sign-extend AX into DX:AX
    cmp   cx, dx          ; was it already correctly sign-extended?
    jne   @@dividend_32bit

    ; BUG: bx=-1  AX=0x8000 overflows with #DE
    ; also, this check rejects larger dividends with larger divisors
    idiv  bx
    mov   bx, dx
    cwd
    ret

;;; Full slow case: divide CX:AX by BX
   @@dividend_32bit:
    mov   si, ax                ; save low half
    mov   ax, cx                ; high half to AX for first div

                                ; CH holds dividend sign
    mov   cl, bh                ; CL holds divisor sign

 ;;; absolute value of inputs
    ; dividend in  AX:SI
    cwd                         ; 0 or -1
    xor   si, dx                ; flip all the bits (or not)
    xor   ax, dx
    sub   si, dx                ; 2's complement identity: -x = ~x - (-1)
    sbb   ax, dx                ; AX:SI = abs(dividend)

    test  bx, bx          ; abs(divisor)
    jnl  @@abs_divisor
    neg   bx
   @@abs_divisor:

 ;;; Unsigned division of absolute values
    xor   dx, dx
    div   bx             ; high half / divisor
    ; dx = remainder = high half for next division
    xchg  ax, si
    div   bx
 ;;; absolute result: rem=DX  quot=SI:AX
    mov   bx, dx
    mov   dx, si


 ;;; Then apply signs to the unsigned results
    test  cx,cx          ; remainder gets sign of dividend
    jns  @@remainder_nonnegative
    neg   bx
  @@remainder_nonnegative:

    xor   cl, ch         ; quotient is negative if opposite signs
    jns  @@quotient_nonnegative
    neg   dx
    neg   ax             ; subtract DX:AX from 0
    sbb   dx, 0          ; with carry
  @@quotient_nonnegative:

    ret
IDIV32 ENDP

Optimizations:

simpler sign-saving and sign-testing, using x86's built-in Sign Flag, set from the MSB of a result, and the js jump if SF==1. Avoids shifting the sign bit down to the bottom of an 8-bit register. Testing for same-signs can be done with xor/jns, because same signs will "cancel" and leave SF=0 whether it was both-0 or both-1. (In general, XOR can be used for comparison for equality, but it's usually only useful that way for bitwise cases where you care about one bit but not others).
Avoid writing CH by itself, for the benefit of modern Intel CPUs that do partial-register renaming for that case. This function never gets CH renamed separately from the rest of ECX. (On older CPUs like 286, there's no downside to mov cx,dx vs. mov ch,dh). We also avoid reading high-8 partial registers (e.g. test cx,cx instead of test ch,ch) because that has higher latency on recent Intel Sandybridge-family CPUs. (How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent). On P6-family, writing low-8 partial registers will rename them separately from the full register, so there it's probably best to just read 8-bit partial regs after writing any.

Of course, on modern CPUs 16-bit registers like cx are partial registers, even in 16-bit mode (because 32-bit registers are available there), so even mov cx,dx has a false dependency on the old value of ECX.

On 386+

Obviously on a 386+ where 32-bit registers / operand-size are available, you would use that even in 16-bit mode:

;; i386 version
    ;; inputs: DX:AX / BX
    shl   edx, 16
    mov   dx, ax         ; pack DX:AX into EDX
    mov   eax, edx

    movsx ebx, bx        ; sign-extend the inputs to 32 bit EBX
    cdq                  ; and 64-bit EDX:EAX
    idiv  ebx
    ; results: quotient in EAX, remainder in EDX

    mov   ebx, edx       ; remainder -> bx
    mov   edx, eax
    sar   edx, 16        ; extract high half of quotient to DX
    ;; result: quotient= DX:AX, remainder = BX

This can #DE from BX=0, or on overflow from DX:AX=-2^31 and BX=-1 (LONG_MIN/-1)

Test harness:

NASM wrapper to call from 32-bit mode

%if __BITS__ = 32
global IDIV32
IDIV32:
    push   esi
    push   ebx
    push   edi      ; not actually clobbered in this version
    movzx  eax, word [esp+4  + 12]
    movzx  edx, word [esp+6  + 12]
    movzx  ebx, word [esp+8  + 12]
    call   IDIV32_16

    shl    edx, 16
    mov    dx, ax
    mov    eax, edx

    movsx  edx, bx       ; pack outputs into EDX:EAX "int64_t"

    pop    edi
    pop    ebx
    pop    esi
    ret
%endif

C program, compile as 32-bit and link with the asm:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <limits.h>

// returns quotient in the low half, remainder in the high half (sign extended)
int64_t IDIV32(int32_t dxax, int16_t bx);

static int test(int a, short b) {
//  printf("\ntest %d / %d\n", a, b);
    int64_t result = IDIV32(a,b);

    int testrem = result>>32;
    int testquot = result;

    if (b==0 || (a==INT_MIN && b==-1)) {
        printf("successfully called with inputs which overflow in C\n"
               "%d/%d gave us %d rem %d\n",
               a,b, testquot, testrem);
        return 1;
    }
    int goodquot = a/b, goodrem = a%b;

    if (goodquot != testquot || goodrem != testrem) {
        printf("%d/%d = %d rem %d\t but we got %d rem %d\n",
               a,b, goodquot, goodrem, testquot, testrem);
        printf("%08x/%04hx = %08x rem %04hx\t but we got %08x rem %04hx\n"
               "%s quotient, %s remainder\n",
               a,b, goodquot, goodrem, testquot, testrem,
               goodquot == testquot ? "good" : "bad",
               goodrem == testrem ? "good" : "bad");
        return 0;
    }
    return 1;
}

int main(int argc, char*argv[])
{
    int a=1234, b=1;
    if(argc>=2) a = strtoll(argv[1], NULL, 0);  // 0x80000000 becomes INT_MIN instead of saturating to INT_MAX in 32-bit conversion
    if(argc>=3) b = strtoll(argv[2], NULL, 0);
    test(a, b);
    test(a, -b);
    test(-a, b);
    test(-a, -b);

    if(argc>=4) {
        int step=strtoll(argv[3], NULL, 0);
        while ( (a+=step) >= 0x7ffe) {  // don't loop through the single-idiv fast path
            // printf("testing %d / %d\n", a,b);
            test(a, b);
            test(-a, -b);
            test(a, -b);
            test(-a, b);
        }
        return 0;
    }
}

(This is sloppy between int and int32_t because I only care about it running on x86 Linux where that's the same type.)

Compile with

 nasm -felf32 div32-16.asm &&
 gcc -g -m32 -Wall -O3 -march=native -fno-pie -no-pie div32-test.c div32-16.o

Run with ./a.out 131076 -2 -1 to test all dividends from that to 0x7ffe (step=-1) with divisor=-2. (For all combinations of -a / -b, a / -b etc.)

I didn't do nested loops for quotient and divisor; you could do that with the shell. I also didn't do anything clever for testing some dividends around the max and some near the bottom of the range.

answered on Stack Overflow Apr 1, 2019 by

Peter Cordes

I re-wrote my idiv32 procedure so that it negates a negative dividend or divisor into a positive/unsigned form, performs the unsigned division, then negates the quotient if dividend XOR divisor is true.

EDIT: Use js and jns instead of testing against a bitmask of 80h. Don't bother returning remainder. Remainder should share the sign of the dividend, but since I don't really need the remainder, I won't bother making the procedure even more complex to correctly handle it.

idiv32 proc

;Divides a signed 32-bit value by a signed 16-bit value.

;alters ax
;alters bx
;alters dx

;expects the signed 32-bit dividend in dx:ax
;expects the signed 16-bit divisor in bx

;returns the signed 32-bit quotient in dx:ax

push cx
push di
push si

    mov ch, dh      ;ch <- sign of dividend
    xor ch, bh      ;ch <- resulting sign of dividend/divisor

    test dh, dh     ;Is sign bit of dividend set?  
    jns chk_divisor ;If not, check the divisors sign.
    xor di, di      ;If so, negate dividend.  
    xor si, si      ;clear di and si   
    sub di, ax      ;subtract low word from 0, cf set if underflow occurs
    sbb si, dx      ;subtract hi word + cf from 0, di:si <- negated dividend
    mov ax, di      
    mov dx, si      ;copy the now negated dividend into dx:ax

chk_divisor:
    xor di, di
    sbb di, bx      ;di <- negated divisor by default
    test bh, bh     ;Is sign bit of divisor set?
    jns uint_div    ;If not, bx is already unsigned. Begin unsigned division.
    mov bx, di      ;If so, copy negated divisor into bx.

uint_div:
    mov di, ax      ;di <- copy of LSW of given dividend
    mov ax, dx      ;ax <- MSW of given dividend
    xor dx, dx      ;dx:ax <- 0:MSW  
    div bx          ;ax:dx <- ax=MSW of final quotient, dx=remainder

    mov si, ax      ;si <- MSW of final quotient
    mov ax, di      ;dx:ax <- dx=previous remainder, ax=LSW of given dividend
    div bx          ;ax:dx <- ax=LSW of final quotient, dx=final remainder
    mov dx, si      ;dx:ax <- final quotient

    test ch, ch     ;Should the quotient be negative?
    js neg_quot     ;If so, negate the quotient.
pop si              ;If not, return. 
pop di
pop cx
    ret

neg_quot:
    xor di, di      
    xor si, si
    sub di, ax
    sbb si, dx
    mov ax, di
    mov dx, si      ;quotient negated
pop si
pop di
pop cx
    ret

idiv32 endp

answered on Stack Overflow Apr 3, 2019 by

My life is a bug. • edited Apr 3, 2019 by

My life is a bug.

User contributions licensed under CC BY-SA 3.0