I'm trying to divide a 32-bit signed integer by a 16-bit signed integer to get a signed 32-bit quotient and 16-bit remainder.
I'm targeting a 286 with no fpu.
I've already written code in the past to do 32-bit unsigned division:
DIV32 PROC
;DIVIDES A 32-BIT VALUE BY A 16-BIT VALUE.
;ALTERS AX
;ALTERS BX
;ALTERS DX
;EXPECTS THE 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE 16-BIT DIVISOR IN BX
;RETURNS THE 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX
push di
push si
mov di, ax ;di -> copy of LSW of given dividend
mov ax, dx ;ax -> MSW of given dividend
xor dx, dx ;dx:ax -> 0:MSW
div bx ;ax:dx -> ax=MSW of final quotient, dx=remainder
mov si, ax ;si -> MSW of final quotient
mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
div bx ;ax:dx -> ax=LSW of final quotient, dx=final remainder
mov bx, dx ;bx -> final remainder
mov dx, si ;dx:ax -> final quotient
pop si
pop di
ret
DIV32 ENDP
So far, i've tried to do the obvious thing and just modify my existing code by swapping out the XOR DX, DX
with CWD
and DIV
with IDIV
:
IDIV32 PROC
;DIVIDES A SIGNED 32-BIT VALUE BY A SIGNED 16-BIT VALUE.
;ALTERS AX
;ALTERS BX
;ALTERS DX
;EXPECTS THE SIGNED 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE SIGNED 16-BIT DIVISOR IN BX
;RETURNS THE SIGNED 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX
push di
push si
mov di, ax ;di -> copy of LSW of given dividend
mov ax, dx ;ax -> MSW of given dividend
cwd ;dx:ax -> 0:MSW, or ffff:MSW
idiv bx ;ax:dx -> ax=MSW of final quotient, dx=remainder
mov si, ax ;si -> MSW of final quotient
mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
idiv bx ;ax:dx -> ax=LSW of final quotient, dx=final remainder
mov bx, dx ;bx -> final remainder
mov dx, si ;dx:ax -> final quotient
pop si
pop di
ret
IDIV32 ENDP
This works for some calculations, like -654,328/2=-327164 (0xfff60408/2=fffb0204). But it doesn't work with certain inputs, -131,076/2 returns the incorrect result of -2 remainder 0. A divisor of 1, or -1 causes a division error seemingly regardless of dividend.
I've tested many different positive and negative dividends and divisors in an attempt to find some kind of pattern of correct and incorrect results, i've noticed that it can't correctly return a result of -65538.
I have a hunch that i'm supposed to use CWD
conditionally depending on input, but it seems like XOR DX, DX
returns incorrect results more often. Either work when both divisor and dividend are positive and the quotient is less than 0x7fffffff.
I don't know any algorithm for splitting a large negative number into parts and preparing it for IDIV. I would calculate the absolute value of dividend and divisor, use the function DIV32 and lastly process the result according to the stored sign:
IDIV32 PROC ; DX:AX / BX = DX/AX rem BX
; 99 / 5 = 19 rem 4
; 99 / -5 = -19 rem 4
; -99 / 5 = -19 rem -4
; -99 / -5 = 19 rem -4
mov ch, dh ; Only the sign bit counts!
shr ch, 7 ; CH=1 means negative dividend
mov cl, bh ; Only the sign bit counts!
shr cl, 7 ; CL=1 means negative divisor
cmp ch, 1 ; DX:AX negative?
jne J1 ; No: Skip the next two lines
not dx ; Yes: Negate DX:AX
neg ax ; CY=0 -> AX was NULL
cmc
adc dx, 0 ; Adjust DX, if AX was NULL
J1:
cmp cl, 1 ; BX negative?
jne J2 ; No: Skip the next line
neg bx ; Yes: Negate BX
J2:
push cx ; Preserve CX
call DIV32
pop cx ; Restore CX
cmp ch, cl ; Had dividend and divisor the same sign?
je J3 ; Yes: Skip the next two lines
not dx
neg ax ; CY=0 -> AX was NULL
cmc
adc dx, 0 ; Adjust DX, if AX was NULL
J3:
cmp ch, 1 ; Was divisor negative?
jnz J4 ; No: Skip the next line
neg bx ; Negate remainder
J4:
ret
IDIV32 ENDP
Your algorithm cannot be changed to signed that simply.
Let's take the calculation (+1)/(-1) as an example:
(+1)/(-1) = (-1), remainder 0
In the first step of your algorithm you are dividing the high bits by the divisor:
The high bits of (+1) are 0, so you are calculating:
0/(-1) = 0, remainder 0
The correct high bits of the entire 32-bit division are however 0FFFFh, not 0. And the reminder you require for the second division would also be 0FFFFh and not 0.
Oh, so the second IDIV should be a DIV. Ok, i'll test it when I wake up tomorrow. And i'll add an answer if I get it working.
The first division already does not produce the result you want. So changing the second division won't help much...
A divisor of 1, or -1 causes a division error seemingly regardless of dividend.
I would expect this only if bit 15 of the dividend is set and:
In these cases you are dividing:
... however, the result is a signed 16-bit value and therefore must be in the range -8000h...+7FFFh.
I just tried 12345678h/(+1) and 12345678h/(-1) in a virtual machine running DOS:
Bit 15 of 12345678h is not set; both times I don't get a division error. (But a wrong result when dividing by -1!)
Using 2x idiv
has a fundamental problem: we need the 2nd division to produce the low half of the quotient, which is unsigned and can be anything from 0 to 0xffff.
Only the highest word of a multi-word integer contains the sign bit, all bits below that have positive place-value. idiv
's quotient range is -2^15 .. 2^15-1
, not 0 .. 65535
. Yes, idiv
can produce all required values, but not from inputs we can get from simple fixups of the first division results. For example, 0:ffff
/ 1
will result in a #DE exception with idiv
because the quotient doesn't fit in a signed 16-bit integer.
So the second division instruction has to be div
, using the absolute value of the divisor, and an appropriate high half. (div
requires both its inputs to be unsigned, so a signed remainder from a first idiv
would be a problem, too.)
It might still be possible to use idiv
for the first division, but only with fixups for its results before div
, on top of still having to take the absolute value of the divisor and first-remainder to feed unsigned div
. It's an interesting possibility, but in practice it's going to be cheaper to save and reapply signs around an unsigned division.
As @Martin points out, the first division of +1 / -1
with naive idiv
gives the wrong high half quotient (0 / -1 = 0 not -1), and the wrong input for the 2nd division (0 % -1 = 0, not -1). TODO: figure out what fixups would actually be needed. Maybe just a conditional +-1, but we know the magnitude of the remainder can't get larger than the divisor, because high_half < divisor
is necessary and sufficient for div
to not fault.
Your -131,076/2 = -2 is (perhaps by coincidence) only off by 1 in one half of its result:
it should be 0xfffefffe = -2:-2 not -1:-2.
Optimized version of @rkhb's function, with DIV32 inlined.
We record input signs and then do unsigned division on the absolute values, and restore sign later. (If remainder sign isn't needed, we can simplify; the quotient sign only depends on xor dividend,divisor
)
Or if the dividend is small enough, we can use one idiv
. We have to avoid the -2^15 / -1
overflow case, though, so the fast check for DX being the sign extension of AX not only misses some safe cases (with larger divisors), but tries that unsafe case. If small numbers are common (like is the case in most computer programs), this fast test based on cwd
is still maybe a good idea, with another test after the absolute-value calculation.
Branches are cheap-ish on 286, so I mostly kept the branching instead of using branchless abs()
. (e.g. for a single register: with cdq (or sar reg,15
) / xor / sub, like compilers make, based on the 2's complement identity that -x = ~x + 1
). And mov
/neg
/cmovl
of course isn't available until P6-family. If you need compat with 286 but mostly care about performance on modern CPUs, you might make different choices. But it turns out that a 32-bit branchless ABS is smaller code size than branching. However, it's probably slower than branchy for positive inputs, where some instructions would have been skipped. Assembler 8086 divide 32 bit number in 16 bit number has an interesting idea of creating 0/-1 integers for both dividend and divisor, and then for later re-applying the signs you can XOR those together and use the same XOR/SUB 2's complement bithack to sign-flip the results or not.
Style: local labels (within the function) prefixed with @@
. I think this is normal for TASM, like NASM .label
local labels.
; signed 32/16 => 32-bit division, using 32/16 => 16-bit division as the building block
; clobbers: CX, SI
IDIV32 PROC ; DX:AX / BX = DX/AX rem BX
;global IDIV32_16 ; for testing with NASM under Linux
;IDIV32_16:
; 99 / 5 = 19 rem 4
; 99 / -5 = -19 rem 4
; -99 / 5 = -19 rem -4
; -99 / -5 = 19 rem -4
mov cx, dx ; save high half before destroying it
;;; Check for simple case
cwd ; sign-extend AX into DX:AX
cmp cx, dx ; was it already correctly sign-extended?
jne @@dividend_32bit
; BUG: bx=-1 AX=0x8000 overflows with #DE
; also, this check rejects larger dividends with larger divisors
idiv bx
mov bx, dx
cwd
ret
;;; Full slow case: divide CX:AX by BX
@@dividend_32bit:
mov si, ax ; save low half
mov ax, cx ; high half to AX for first div
; CH holds dividend sign
mov cl, bh ; CL holds divisor sign
;;; absolute value of inputs
; dividend in AX:SI
cwd ; 0 or -1
xor si, dx ; flip all the bits (or not)
xor ax, dx
sub si, dx ; 2's complement identity: -x = ~x - (-1)
sbb ax, dx ; AX:SI = abs(dividend)
test bx, bx ; abs(divisor)
jnl @@abs_divisor
neg bx
@@abs_divisor:
;;; Unsigned division of absolute values
xor dx, dx
div bx ; high half / divisor
; dx = remainder = high half for next division
xchg ax, si
div bx
;;; absolute result: rem=DX quot=SI:AX
mov bx, dx
mov dx, si
;;; Then apply signs to the unsigned results
test cx,cx ; remainder gets sign of dividend
jns @@remainder_nonnegative
neg bx
@@remainder_nonnegative:
xor cl, ch ; quotient is negative if opposite signs
jns @@quotient_nonnegative
neg dx
neg ax ; subtract DX:AX from 0
sbb dx, 0 ; with carry
@@quotient_nonnegative:
ret
IDIV32 ENDP
Optimizations:
simpler sign-saving and sign-testing, using x86's built-in Sign Flag, set from the MSB of a result, and the js
jump if SF==1. Avoids shifting the sign bit down to the bottom of an 8-bit register. Testing for same-signs can be done with xor/jns, because same signs will "cancel" and leave SF=0 whether it was both-0 or both-1. (In general, XOR can be used for comparison for equality, but it's usually only useful that way for bitwise cases where you care about one bit but not others).
Avoid writing CH by itself, for the benefit of modern Intel CPUs that do partial-register renaming for that case. This function never gets CH renamed separately from the rest of ECX. (On older CPUs like 286, there's no downside to mov cx,dx
vs. mov ch,dh
). We also avoid reading high-8 partial registers (e.g. test cx,cx
instead of test ch,ch
) because that has higher latency on recent Intel Sandybridge-family CPUs. (How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent). On P6-family, writing low-8 partial registers will rename them separately from the full register, so there it's probably best to just read 8-bit partial regs after writing any.
Of course, on modern CPUs 16-bit registers like cx
are partial registers, even in 16-bit mode (because 32-bit registers are available there), so even mov cx,dx
has a false dependency on the old value of ECX.
Obviously on a 386+ where 32-bit registers / operand-size are available, you would use that even in 16-bit mode:
;; i386 version
;; inputs: DX:AX / BX
shl edx, 16
mov dx, ax ; pack DX:AX into EDX
mov eax, edx
movsx ebx, bx ; sign-extend the inputs to 32 bit EBX
cdq ; and 64-bit EDX:EAX
idiv ebx
; results: quotient in EAX, remainder in EDX
mov ebx, edx ; remainder -> bx
mov edx, eax
sar edx, 16 ; extract high half of quotient to DX
;; result: quotient= DX:AX, remainder = BX
This can #DE from BX=0, or on overflow from DX:AX=-2^31 and BX=-1 (LONG_MIN/-1
)
NASM wrapper to call from 32-bit mode
%if __BITS__ = 32
global IDIV32
IDIV32:
push esi
push ebx
push edi ; not actually clobbered in this version
movzx eax, word [esp+4 + 12]
movzx edx, word [esp+6 + 12]
movzx ebx, word [esp+8 + 12]
call IDIV32_16
shl edx, 16
mov dx, ax
mov eax, edx
movsx edx, bx ; pack outputs into EDX:EAX "int64_t"
pop edi
pop ebx
pop esi
ret
%endif
C program, compile as 32-bit and link with the asm:
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <limits.h>
// returns quotient in the low half, remainder in the high half (sign extended)
int64_t IDIV32(int32_t dxax, int16_t bx);
static int test(int a, short b) {
// printf("\ntest %d / %d\n", a, b);
int64_t result = IDIV32(a,b);
int testrem = result>>32;
int testquot = result;
if (b==0 || (a==INT_MIN && b==-1)) {
printf("successfully called with inputs which overflow in C\n"
"%d/%d gave us %d rem %d\n",
a,b, testquot, testrem);
return 1;
}
int goodquot = a/b, goodrem = a%b;
if (goodquot != testquot || goodrem != testrem) {
printf("%d/%d = %d rem %d\t but we got %d rem %d\n",
a,b, goodquot, goodrem, testquot, testrem);
printf("%08x/%04hx = %08x rem %04hx\t but we got %08x rem %04hx\n"
"%s quotient, %s remainder\n",
a,b, goodquot, goodrem, testquot, testrem,
goodquot == testquot ? "good" : "bad",
goodrem == testrem ? "good" : "bad");
return 0;
}
return 1;
}
int main(int argc, char*argv[])
{
int a=1234, b=1;
if(argc>=2) a = strtoll(argv[1], NULL, 0); // 0x80000000 becomes INT_MIN instead of saturating to INT_MAX in 32-bit conversion
if(argc>=3) b = strtoll(argv[2], NULL, 0);
test(a, b);
test(a, -b);
test(-a, b);
test(-a, -b);
if(argc>=4) {
int step=strtoll(argv[3], NULL, 0);
while ( (a+=step) >= 0x7ffe) { // don't loop through the single-idiv fast path
// printf("testing %d / %d\n", a,b);
test(a, b);
test(-a, -b);
test(a, -b);
test(-a, b);
}
return 0;
}
}
(This is sloppy between int
and int32_t
because I only care about it running on x86 Linux where that's the same type.)
Compile with
nasm -felf32 div32-16.asm &&
gcc -g -m32 -Wall -O3 -march=native -fno-pie -no-pie div32-test.c div32-16.o
Run with ./a.out 131076 -2 -1
to test all dividends from that to 0x7ffe (step=-1) with divisor=-2. (For all combinations of -a / -b
, a / -b
etc.)
I didn't do nested loops for quotient and divisor; you could do that with the shell. I also didn't do anything clever for testing some dividends around the max and some near the bottom of the range.
I re-wrote my idiv32
procedure so that it negates a negative dividend or divisor into a positive/unsigned form, performs the unsigned division, then negates the quotient if dividend XOR divisor is true.
EDIT: Use js
and jns
instead of testing against a bitmask of 80h. Don't bother returning remainder. Remainder should share the sign of the dividend, but since I don't really need the remainder, I won't bother making the procedure even more complex to correctly handle it.
idiv32 proc
;Divides a signed 32-bit value by a signed 16-bit value.
;alters ax
;alters bx
;alters dx
;expects the signed 32-bit dividend in dx:ax
;expects the signed 16-bit divisor in bx
;returns the signed 32-bit quotient in dx:ax
push cx
push di
push si
mov ch, dh ;ch <- sign of dividend
xor ch, bh ;ch <- resulting sign of dividend/divisor
test dh, dh ;Is sign bit of dividend set?
jns chk_divisor ;If not, check the divisors sign.
xor di, di ;If so, negate dividend.
xor si, si ;clear di and si
sub di, ax ;subtract low word from 0, cf set if underflow occurs
sbb si, dx ;subtract hi word + cf from 0, di:si <- negated dividend
mov ax, di
mov dx, si ;copy the now negated dividend into dx:ax
chk_divisor:
xor di, di
sbb di, bx ;di <- negated divisor by default
test bh, bh ;Is sign bit of divisor set?
jns uint_div ;If not, bx is already unsigned. Begin unsigned division.
mov bx, di ;If so, copy negated divisor into bx.
uint_div:
mov di, ax ;di <- copy of LSW of given dividend
mov ax, dx ;ax <- MSW of given dividend
xor dx, dx ;dx:ax <- 0:MSW
div bx ;ax:dx <- ax=MSW of final quotient, dx=remainder
mov si, ax ;si <- MSW of final quotient
mov ax, di ;dx:ax <- dx=previous remainder, ax=LSW of given dividend
div bx ;ax:dx <- ax=LSW of final quotient, dx=final remainder
mov dx, si ;dx:ax <- final quotient
test ch, ch ;Should the quotient be negative?
js neg_quot ;If so, negate the quotient.
pop si ;If not, return.
pop di
pop cx
ret
neg_quot:
xor di, di
xor si, si
sub di, ax
sbb si, dx
mov ax, di
mov dx, si ;quotient negated
pop si
pop di
pop cx
ret
idiv32 endp
User contributions licensed under CC BY-SA 3.0