# 32-bit / 16-bit signed integer division without 32-bit registers?

4

I'm trying to divide a 32-bit signed integer by a 16-bit signed integer to get a signed 32-bit quotient and 16-bit remainder.

I'm targeting a 286 with no fpu.

I've already written code in the past to do 32-bit unsigned division:

``````DIV32 PROC

;DIVIDES A 32-BIT VALUE BY A 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE 16-BIT DIVISOR IN BX

;RETURNS THE 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

push di
push si

mov di, ax ;di -> copy of LSW of given dividend
mov ax, dx ;ax -> MSW of given dividend
xor dx, dx ;dx:ax -> 0:MSW
div bx     ;ax:dx -> ax=MSW of final quotient, dx=remainder

mov si, ax ;si -> MSW of final quotient
mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
div bx     ;ax:dx -> ax=LSW of final quotient, dx=final remainder
mov bx, dx ;bx -> final remainder
mov dx, si ;dx:ax -> final quotient

pop si
pop di
ret

DIV32 ENDP
``````

So far, i've tried to do the obvious thing and just modify my existing code by swapping out the `XOR DX, DX` with `CWD` and `DIV` with `IDIV`:

``````IDIV32 PROC

;DIVIDES A SIGNED 32-BIT VALUE BY A SIGNED 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE SIGNED 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE SIGNED 16-BIT DIVISOR IN BX

;RETURNS THE SIGNED 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

push di
push si

mov di, ax ;di -> copy of LSW of given dividend
mov ax, dx ;ax -> MSW of given dividend
cwd        ;dx:ax -> 0:MSW, or ffff:MSW
idiv bx    ;ax:dx -> ax=MSW of final quotient, dx=remainder

mov si, ax ;si -> MSW of final quotient
mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
idiv bx    ;ax:dx -> ax=LSW of final quotient, dx=final remainder
mov bx, dx ;bx -> final remainder
mov dx, si ;dx:ax -> final quotient

pop si
pop di
ret

IDIV32 ENDP
``````

This works for some calculations, like -654,328/2=-327164 (0xfff60408/2=fffb0204). But it doesn't work with certain inputs, -131,076/2 returns the incorrect result of -2 remainder 0. A divisor of 1, or -1 causes a division error seemingly regardless of dividend.

I've tested many different positive and negative dividends and divisors in an attempt to find some kind of pattern of correct and incorrect results, i've noticed that it can't correctly return a result of -65538.

I have a hunch that i'm supposed to use `CWD` conditionally depending on input, but it seems like `XOR DX, DX` returns incorrect results more often. Either work when both divisor and dividend are positive and the quotient is less than 0x7fffffff.

assembly
x86-16
integer-division

3

I don't know any algorithm for splitting a large negative number into parts and preparing it for IDIV. I would calculate the absolute value of dividend and divisor, use the function DIV32 and lastly process the result according to the stored sign:

``````IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
; 99 / 5   =  19 rem 4
; 99 / -5  = -19 rem 4
; -99 / 5  = -19 rem -4
; -99 / -5 =  19 rem -4

mov ch, dh          ; Only the sign bit counts!
shr ch, 7           ; CH=1 means negative dividend
mov cl, bh          ; Only the sign bit counts!
shr cl, 7           ; CL=1 means negative divisor

cmp ch, 1           ; DX:AX negative?
jne J1              ; No: Skip the next two lines
not dx              ; Yes: Negate DX:AX
neg ax              ; CY=0 -> AX was NULL
cmc
J1:

cmp cl, 1           ; BX negative?
jne J2              ; No: Skip the next line
neg bx              ; Yes: Negate BX
J2:

push cx             ; Preserve CX
call DIV32
pop cx              ; Restore CX

cmp ch, cl          ; Had dividend and divisor the same sign?
je J3               ; Yes: Skip the next two lines
not dx
neg ax              ; CY=0 -> AX was NULL
cmc
J3:

cmp ch, 1           ; Was divisor negative?
jnz J4              ; No: Skip the next line
neg bx              ; Negate remainder
J4:

ret
IDIV32 ENDP
``````
2

Your algorithm cannot be changed to signed that simply.

Let's take the calculation (+1)/(-1) as an example:

(+1)/(-1) = (-1), remainder 0

In the first step of your algorithm you are dividing the high bits by the divisor:

The high bits of (+1) are 0, so you are calculating:

0/(-1) = 0, remainder 0

The correct high bits of the entire 32-bit division are however 0FFFFh, not 0. And the reminder you require for the second division would also be 0FFFFh and not 0.

Oh, so the second IDIV should be a DIV. Ok, i'll test it when I wake up tomorrow. And i'll add an answer if I get it working.

The first division already does not produce the result you want. So changing the second division won't help much...

A divisor of 1, or -1 causes a division error seemingly regardless of dividend.

I would expect this only if bit 15 of the dividend is set and:

• ... you divide by 1 or
• ... you divide by -1 and at least one of the low 15 bits of the dividend is set

In these cases you are dividing:

• ... a number in the range 000008000h...00000FFFFh by 1
The result would be in the range +08000h...+0FFFFh
• ... a number in the range 000008001h...00000FFFFh by -1
The result would be in the range -0FFFFh...-08001h

... however, the result is a signed 16-bit value and therefore must be in the range -8000h...+7FFFh.

I just tried 12345678h/(+1) and 12345678h/(-1) in a virtual machine running DOS:

Bit 15 of 12345678h is not set; both times I don't get a division error. (But a wrong result when dividing by -1!)

1

Using 2x `idiv` has a fundamental problem: we need the 2nd division to produce the low half of the quotient, which is unsigned and can be anything from 0 to 0xffff.

Only the highest word of a multi-word integer contains the sign bit, all bits below that have positive place-value. `idiv`'s quotient range is `-2^15 .. 2^15-1`, not `0 .. 65535`. Yes, `idiv` can produce all required values, but not from inputs we can get from simple fixups of the first division results. For example, `0:ffff` / `1` will result in a #DE exception with `idiv` because the quotient doesn't fit in a signed 16-bit integer.

So the second division instruction has to be `div`, using the absolute value of the divisor, and an appropriate high half. (`div` requires both its inputs to be unsigned, so a signed remainder from a first `idiv` would be a problem, too.)

It might still be possible to use `idiv` for the first division, but only with fixups for its results before `div`, on top of still having to take the absolute value of the divisor and first-remainder to feed unsigned `div`. It's an interesting possibility, but in practice it's going to be cheaper to save and reapply signs around an unsigned division.

As @Martin points out, the first division of `+1 / -1` with naive `idiv` gives the wrong high half quotient (0 / -1 = 0 not -1), and the wrong input for the 2nd division (0 % -1 = 0, not -1). TODO: figure out what fixups would actually be needed. Maybe just a conditional +-1, but we know the magnitude of the remainder can't get larger than the divisor, because `high_half < divisor` is necessary and sufficient for `div` to not fault.

Your -131,076/2 = -2 is (perhaps by coincidence) only off by 1 in one half of its result:
it should be 0xfffefffe = -2:-2 not -1:-2.

Optimized version of @rkhb's function, with DIV32 inlined.

We record input signs and then do unsigned division on the absolute values, and restore sign later. (If remainder sign isn't needed, we can simplify; the quotient sign only depends on `xor dividend,divisor`)

Or if the dividend is small enough, we can use one `idiv`. We have to avoid the `-2^15 / -1` overflow case, though, so the fast check for DX being the sign extension of AX not only misses some safe cases (with larger divisors), but tries that unsafe case. If small numbers are common (like is the case in most computer programs), this fast test based on `cwd` is still maybe a good idea, with another test after the absolute-value calculation.

Branches are cheap-ish on 286, so I mostly kept the branching instead of using branchless `abs()`. (e.g. for a single register: with cdq (or `sar reg,15`) / xor / sub, like compilers make, based on the 2's complement identity that `-x = ~x + 1`). And `mov`/`neg`/`cmovl` of course isn't available until P6-family. If you need compat with 286 but mostly care about performance on modern CPUs, you might make different choices. But it turns out that a 32-bit branchless ABS is smaller code size than branching. However, it's probably slower than branchy for positive inputs, where some instructions would have been skipped. Assembler 8086 divide 32 bit number in 16 bit number has an interesting idea of creating 0/-1 integers for both dividend and divisor, and then for later re-applying the signs you can XOR those together and use the same XOR/SUB 2's complement bithack to sign-flip the results or not.

Style: local labels (within the function) prefixed with `@@`. I think this is normal for TASM, like NASM `.label` local labels.

``````   ; signed 32/16 => 32-bit division, using 32/16 => 16-bit division as the building block
; clobbers: CX, SI
IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
;global IDIV32_16       ; for testing with NASM under Linux
;IDIV32_16:
; 99 / 5   =  19 rem 4
; 99 / -5  = -19 rem 4
; -99 / 5  = -19 rem -4
; -99 / -5 =  19 rem -4

mov   cx, dx          ; save high half before destroying it

;;; Check for simple case
cwd                   ; sign-extend AX into DX:AX
cmp   cx, dx          ; was it already correctly sign-extended?
jne   @@dividend_32bit

; BUG: bx=-1  AX=0x8000 overflows with #DE
; also, this check rejects larger dividends with larger divisors
idiv  bx
mov   bx, dx
cwd
ret

;;; Full slow case: divide CX:AX by BX
@@dividend_32bit:
mov   si, ax                ; save low half
mov   ax, cx                ; high half to AX for first div

; CH holds dividend sign
mov   cl, bh                ; CL holds divisor sign

;;; absolute value of inputs
; dividend in  AX:SI
cwd                         ; 0 or -1
xor   si, dx                ; flip all the bits (or not)
xor   ax, dx
sub   si, dx                ; 2's complement identity: -x = ~x - (-1)
sbb   ax, dx                ; AX:SI = abs(dividend)

test  bx, bx          ; abs(divisor)
jnl  @@abs_divisor
neg   bx
@@abs_divisor:

;;; Unsigned division of absolute values
xor   dx, dx
div   bx             ; high half / divisor
; dx = remainder = high half for next division
xchg  ax, si
div   bx
;;; absolute result: rem=DX  quot=SI:AX
mov   bx, dx
mov   dx, si

;;; Then apply signs to the unsigned results
test  cx,cx          ; remainder gets sign of dividend
jns  @@remainder_nonnegative
neg   bx
@@remainder_nonnegative:

xor   cl, ch         ; quotient is negative if opposite signs
jns  @@quotient_nonnegative
neg   dx
neg   ax             ; subtract DX:AX from 0
sbb   dx, 0          ; with carry
@@quotient_nonnegative:

ret
IDIV32 ENDP
``````

Optimizations:

• simpler sign-saving and sign-testing, using x86's built-in Sign Flag, set from the MSB of a result, and the `js` jump if SF==1. Avoids shifting the sign bit down to the bottom of an 8-bit register. Testing for same-signs can be done with xor/jns, because same signs will "cancel" and leave SF=0 whether it was both-0 or both-1. (In general, XOR can be used for comparison for equality, but it's usually only useful that way for bitwise cases where you care about one bit but not others).

• Avoid writing CH by itself, for the benefit of modern Intel CPUs that do partial-register renaming for that case. This function never gets CH renamed separately from the rest of ECX. (On older CPUs like 286, there's no downside to `mov cx,dx` vs. `mov ch,dh`). We also avoid reading high-8 partial registers (e.g. `test cx,cx` instead of `test ch,ch`) because that has higher latency on recent Intel Sandybridge-family CPUs. (How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent). On P6-family, writing low-8 partial registers will rename them separately from the full register, so there it's probably best to just read 8-bit partial regs after writing any.

Of course, on modern CPUs 16-bit registers like `cx` are partial registers, even in 16-bit mode (because 32-bit registers are available there), so even `mov cx,dx` has a false dependency on the old value of ECX.

## On 386+

Obviously on a 386+ where 32-bit registers / operand-size are available, you would use that even in 16-bit mode:

``````;; i386 version
;; inputs: DX:AX / BX
shl   edx, 16
mov   dx, ax         ; pack DX:AX into EDX
mov   eax, edx

movsx ebx, bx        ; sign-extend the inputs to 32 bit EBX
cdq                  ; and 64-bit EDX:EAX
idiv  ebx
; results: quotient in EAX, remainder in EDX

mov   ebx, edx       ; remainder -> bx
mov   edx, eax
sar   edx, 16        ; extract high half of quotient to DX
;; result: quotient= DX:AX, remainder = BX
``````

This can #DE from BX=0, or on overflow from DX:AX=-2^31 and BX=-1 (`LONG_MIN/-1`)

## Test harness:

NASM wrapper to call from 32-bit mode

``````%if __BITS__ = 32
global IDIV32
IDIV32:
push   esi
push   ebx
push   edi      ; not actually clobbered in this version
movzx  eax, word [esp+4  + 12]
movzx  edx, word [esp+6  + 12]
movzx  ebx, word [esp+8  + 12]
call   IDIV32_16

shl    edx, 16
mov    dx, ax
mov    eax, edx

movsx  edx, bx       ; pack outputs into EDX:EAX "int64_t"

pop    edi
pop    ebx
pop    esi
ret
%endif
``````

C program, compile as 32-bit and link with the asm:

``````#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <limits.h>

// returns quotient in the low half, remainder in the high half (sign extended)
int64_t IDIV32(int32_t dxax, int16_t bx);

static int test(int a, short b) {
//  printf("\ntest %d / %d\n", a, b);
int64_t result = IDIV32(a,b);

int testrem = result>>32;
int testquot = result;

if (b==0 || (a==INT_MIN && b==-1)) {
printf("successfully called with inputs which overflow in C\n"
"%d/%d gave us %d rem %d\n",
a,b, testquot, testrem);
return 1;
}
int goodquot = a/b, goodrem = a%b;

if (goodquot != testquot || goodrem != testrem) {
printf("%d/%d = %d rem %d\t but we got %d rem %d\n",
a,b, goodquot, goodrem, testquot, testrem);
printf("%08x/%04hx = %08x rem %04hx\t but we got %08x rem %04hx\n"
"%s quotient, %s remainder\n",
a,b, goodquot, goodrem, testquot, testrem,
goodquot == testquot ? "good" : "bad",
goodrem == testrem ? "good" : "bad");
return 0;
}
return 1;
}

int main(int argc, char*argv[])
{
int a=1234, b=1;
if(argc>=2) a = strtoll(argv, NULL, 0);  // 0x80000000 becomes INT_MIN instead of saturating to INT_MAX in 32-bit conversion
if(argc>=3) b = strtoll(argv, NULL, 0);
test(a, b);
test(a, -b);
test(-a, b);
test(-a, -b);

if(argc>=4) {
int step=strtoll(argv, NULL, 0);
while ( (a+=step) >= 0x7ffe) {  // don't loop through the single-idiv fast path
// printf("testing %d / %d\n", a,b);
test(a, b);
test(-a, -b);
test(a, -b);
test(-a, b);
}
return 0;
}
}
``````

(This is sloppy between `int` and `int32_t` because I only care about it running on x86 Linux where that's the same type.)

Compile with

`````` nasm -felf32 div32-16.asm &&
gcc -g -m32 -Wall -O3 -march=native -fno-pie -no-pie div32-test.c div32-16.o
``````

Run with `./a.out 131076 -2 -1` to test all dividends from that to 0x7ffe (step=-1) with divisor=-2. (For all combinations of `-a / -b`, `a / -b` etc.)

I didn't do nested loops for quotient and divisor; you could do that with the shell. I also didn't do anything clever for testing some dividends around the max and some near the bottom of the range.

0

I re-wrote my `idiv32` procedure so that it negates a negative dividend or divisor into a positive/unsigned form, performs the unsigned division, then negates the quotient if dividend XOR divisor is true.

EDIT: Use `js` and `jns` instead of testing against a bitmask of 80h. Don't bother returning remainder. Remainder should share the sign of the dividend, but since I don't really need the remainder, I won't bother making the procedure even more complex to correctly handle it.

``````idiv32 proc

;Divides a signed 32-bit value by a signed 16-bit value.

;alters ax
;alters bx
;alters dx

;expects the signed 32-bit dividend in dx:ax
;expects the signed 16-bit divisor in bx

;returns the signed 32-bit quotient in dx:ax

push cx
push di
push si

mov ch, dh      ;ch <- sign of dividend
xor ch, bh      ;ch <- resulting sign of dividend/divisor

test dh, dh     ;Is sign bit of dividend set?
jns chk_divisor ;If not, check the divisors sign.
xor di, di      ;If so, negate dividend.
xor si, si      ;clear di and si
sub di, ax      ;subtract low word from 0, cf set if underflow occurs
sbb si, dx      ;subtract hi word + cf from 0, di:si <- negated dividend
mov ax, di
mov dx, si      ;copy the now negated dividend into dx:ax

chk_divisor:
xor di, di
sbb di, bx      ;di <- negated divisor by default
test bh, bh     ;Is sign bit of divisor set?
jns uint_div    ;If not, bx is already unsigned. Begin unsigned division.
mov bx, di      ;If so, copy negated divisor into bx.

uint_div:
mov di, ax      ;di <- copy of LSW of given dividend
mov ax, dx      ;ax <- MSW of given dividend
xor dx, dx      ;dx:ax <- 0:MSW
div bx          ;ax:dx <- ax=MSW of final quotient, dx=remainder

mov si, ax      ;si <- MSW of final quotient
mov ax, di      ;dx:ax <- dx=previous remainder, ax=LSW of given dividend
div bx          ;ax:dx <- ax=LSW of final quotient, dx=final remainder
mov dx, si      ;dx:ax <- final quotient

test ch, ch     ;Should the quotient be negative?
js neg_quot     ;If so, negate the quotient.
pop si              ;If not, return.
pop di
pop cx
ret

neg_quot:
xor di, di
xor si, si
sub di, ax
sbb si, dx
mov ax, di
mov dx, si      ;quotient negated
pop si
pop di
pop cx
ret

idiv32 endp
``````

User contributions licensed under CC BY-SA 3.0