How to convert negative bit-represented numbers to their actual negative int value in python3?

Question

How to convert negative bit-represented numbers to their actual negative int value in python3?

Hello I have solved this leetcode question https://leetcode.com/problems/single-number-ii. The objective is to solve the problem in O(n) time and 0(1) space. The code I wrote is the following:

class Solution:
    def singleNumber(self, nums: List[int]) -> int:
        counter = [0 for i in range(32)]
        result = 0
        for i in range(32):
            for num in nums:
                if ((num >> i) & 1):
                    counter[i] += 1
            result = result | ((counter[i] % 3) << i)
        return self.convert(result)
        #return result

    def convert(self,x):
        if x >= 2**31:
            x = (~x & 0xffffffff) + 1
            x = -x
        return x

Now the interesting part is in the convert function, since python uses objects to store int as opposed to a 32 bit word or something, it does not know that the result is negative when the MSB of my counter is set to 1. I handle that by converting it to its 2's complement and returning the negative value.

Now someone else posted their solution with:

def convert(self, x):
    if x >= 2**31:
        x -= 2**32
    return x

And I can't figure out why that works. I need help understanding why this subtraction works.

python

python-3.x

algorithm

bit-manipulation

bit

asked on Stack Overflow May 17, 2019 by

d_darric • edited May 18, 2019 by

d_darric

4 Answers

Python integers are infinitely large. They will not turn negative as you add more bits so two's complement may not work as expected. You could manage negatives differently.

def singleNumber(nums):
    result = 0
    sign   = [1,-1][sum(int(n<0) for n in nums)%3]
    for i in range(32):
        counter = 0
        for num in nums:
            counter += (abs(num) >> i) & 1
        result = result | ((counter % 3) << i)
    return result * sign

This binary approach can be optimized and simplified like this:

def singleNumber(nums):
    result = 0
    for i in range(32):
        counter = sum(1 for n in nums if (n>>i)&1)
        if counter > 0: result |= (counter % 3) << i
    return result - 2*(result&(1<<31))

If you like one liners, you can implement it using reduce() from functools:

result = reduce(lambda r,i:r|sum(1&(n>>i) for n in nums)%3<<i,range(32),sum(n<0 for n in nums)%3*(-1<<32))

Note that this approach will always do 32 passes through the data and will be limited to numbers in the range -2^31...2^31. Increasing this range will systematically augment the number of passes through the list of numbers (even if the list only contains small values). Also, since you're not using counter[i] outside of the i loop, you don't need a list to store the counters.

You could leverage base 3 instead of base 2 using a very similar approach (which also responds in O(n) time and O(1) space):

def singleNumber(nums):
    result = sign = 0
    for num in nums:
        if num<0 : sign += 1
        base3 = 1
        num   = abs(num)
        while num > 0 :
            num,rest   = divmod(num,3)
            rest,base3 = rest*base3, 3*base3
            if rest == 0 : continue
            digit  = result % base3
            result = result - digit + (digit+rest)%base3      
    return result * (1-sign%3*2)

This one has the advantage that it will go through the list only once (thus supporting iterators as input). It does not limit the range of values and will perform the nested while loop as few times as possible (in accordance with the magnitude of each value)

The way it works is by adding digits independently in a base 3 representation and cycling the result (digit by digit) without applying a carry.

For example: [ 16, 16, 32, 16 ]

    Base10    Base 3    Base 3 digits  result (cumulative)
    ------    ------    -------------  ------
      16         121    0 | 1 | 2 | 1     121
      16         121    0 | 1 | 2 | 1     212 
      32        2012    2 | 0 | 1 | 2    2221 
      16         121    0 | 1 | 2 | 1    2012
                        -------------
    sum of digits % 3   2 | 0 | 1 | 2  ==> 32

The while num > 0 loop processes the digits. It will run at most log(V,3) times where V is the largest absolute value in the numbers list. As such it is similar to the for i in range(32) loop in the base 2 solution except that it always uses the smallest possible range. For any given pattern of values, the number of iterations of that while loop is going to be less or equal to a constant thus preserving the O(n) complexity of the main loop.

I made a few performance tests and, in practice, the base3 version is only faster than the base2 approach when values are small. The base3 approach always performs fewer iterations but, when values are large, it loses out in total execution time because of the overhead of modulo vs bitwise operations.

In order for the base2 solution to always be faster than the base 3 approach, it needs to optimize its iterations through the bits by reversing the loop nesting (bits inside numbers instead of numbers inside bits):

def singleNumber(nums):
    bits   = [0]*len(bin(max(nums,key=abs)))
    sign   = 0 
    for num in nums:
        if num<0 : sign += 1 
        num = abs(num)
        bit = 0
        while num > 0:
            if num&1 : bits[bit] += 1
            bit  += 1
            num >>= 1
    result = sum(1<<bit for bit,count in enumerate(bits) if count%3)
    return result * [1,-1][sign%3]

Now it will outperform the base 3 approach every time. As a side benefit, it is no longer limited by a value range and will support iterators as input. Note that the size of the bits array can be treated as a constant so this is also a O(1) space solution

But, to be fair, if we apply the same optimization to the base 3 approach (i.e. using a list of base 3 'bits'), its performance comes back in front for all value sizes:

def singleNumber(nums):
    tribits = [0]*len(bin(max(nums,key=abs))) # enough base 2 -> enough 3
    sign    = 0 
    for num in nums:
        if num<0 : sign += 1 
        num = abs(num)
        base3 = 0
        while num > 0:
            digit = num%3
            if digit: tribits[base3] += digit
            base3  += 1
            num   //= 3
    result = sum(count%3 * 3**base3 for base3,count in enumerate(tribits) if count%3)
    return result * [1,-1][sign%3]

.

Counter from collections would give the expected result in O(n) time with a single line of code:

from collections import Counter
numbers = [1,0,1,0,1,0,99]
singleN = next(n for n,count in Counter(numbers).items() if count == 1)

Sets would also work in O(n):

distinct = set()
multiple = [n for n in numbers if n in distinct or distinct.add(n)]
singleN  = min(distinct.difference(multiple))

These last two solutions do use a variable amount of extra memory that is proportional to the size of the list (i.e. not O(1) space). On the other hand, they run 30 times faster and they will support any data type in the list. They also support iterators

answered on Stack Overflow May 17, 2019 by

Alain T. • edited May 18, 2019 by

Alain T.

The value of the highest bit of an unsigned n-bit number is 2^n-1.

The value of the highest bit of a signed two's complement n-bit number is -2^n-1.

The difference between those two values is 2ⁿ.

So if a unsigned n-bit number has the highest bit set, to convert to a two's complement signed number subtract 2ⁿ.

In a 32-bit number, if bit 31 is set, the number will be >= 2³¹, so the formula would be:

if n >= 2**31:
    n -= 2**32

I hope that makes it clear.

answered on Stack Overflow May 18, 2019 by

Mark Tolonen

32-bit signed integers wrap around every 2**32, so a positive number with the the sign bit set (ie >= 2**31) has the same binary representation as the negative number 2**32 less.

answered on Stack Overflow May 17, 2019 by

stark • edited May 17, 2019 by

stark

That is the very definition of two's complement code of a number A on n bits.

if number A is positive use the binary code of A
if A is negative, use the binary code of 2^n+A (or 2^n-|A|). This number is the one you have to add to |A| to get 2^n (i.e. the complement of |A| to 2^n, hence the name of the two's complement method).

So, if you have a negative number B coded in two's complement, what is actually in its code is 2^N+B. To get its value, you have to substract 2^N from B.

There are many other definitions of two's complement (~A+1, ~(A-1), etc), but this one is the most useful as it explains why adding signed two's complement numbers is absolutely identical to adding positive numbers. The number is in the code (with 2^32 added if negative) and the addition result will be correct, provided you ignore the 2^32 that may be generated as a carry out (and there is no overflow). This arithmetic property is the main reason why two's complement is used in computers.

answered on Stack Overflow May 17, 2019 by

Alain Merigot

User contributions licensed under CC BY-SA 3.0