How to force C to interpet variables as signed or unsigned values?

Question

How to force C to interpet variables as signed or unsigned values?

I am working on a project where I often need to interpret certain variables as signed or unsigned values and do signed operations on them; however, in multiple cases subtle, seemingly insignificant changes swapped an unsigned interpretation to a signed one, while in other cases I couldn't force C to interpret it as a signed value and it remained unsigned. Here are two examples:

int32_t pop();

//Version 1
push((int32_t)( (-1) * (pop() - pop()) ) );

//Version 2
int32_t temp1 = pop();
int32_t temp2 = pop();
push((int32_t)( (-1) * (temp1 - temp2) ) );

/*Another example */

//Version 1
int32_t get_signed_argument(uint8_t* argument) {
  return (int32_t)( (((int32_t)argument[0] << 8) & (int32_t)0x0000ff00 | (((int32_t)argument[1]) & (int32_t)0x000000ff) );
}

//Version 2
int16_t get_signed_argument(uint8_t* argument) {
  return (int16_t)( (((int16_t)argument[0] << 8) & (int16_t)0xff00 | (((int16_t)argument[1]) & (int16_t)0x00ff) );
}

In the first example version 1 does not seem to multiply the value by -1, while version 2 does, but the only difference is storing the intermediate values of the calculation in temporary variables in one case or not doing so in the other.

In the second example the value returned by version 1 is the unsigned interpretation of the same bytes as the returned value of version 2, which interprets it in 2's complement. The only difference is using int16_t or int32_t.

In both cases I am using signed types (int32_t, int16_t), but this doesn't seem to be sufficient to interpret them as signed values. Can you please explain why these differences cause a difference in signedness? Where can I find more information on this? How can I use the shorter version of the first example, but still get signed values? Thank you in advance!

c

signed

asked on Stack Overflow Jun 19, 2019 by

Peter • edited Jun 19, 2019 by

Peter

5 Answers

I assume pop() returns an unsigned type. If so, the expression pop() - pop() will be performed using unsigned arithmetic, which is modular and wraps around if the second pop() is larger than the first one (BTW, C doesn't specify a particular order of evaluation, so there's no guarantee which popped value will be first or second).

As a result, the value that you multiply by -1 might not be the difference you expect; if there was wraparound, it could be a large positive value rather than a negative value.

You can get the equivalent of the temporaries if you cast at least one of the function calls directly.

push(-1 * ((int32_t)pop() - pop()));

answered on Stack Overflow Jun 19, 2019 by

Barmar

if you just want to convert a binary buffer to the longer signed integers for example received form somewhere (I assume the little endian)

int16_t bufftoInt16(const uint8_t *buff)
{
    return (uint16_t)buff[0] | ((uint16_t)buff[1] << 8);
}

int32_t bufftoInt32(const uint8_t *buff)
{
    return (uint32_t)buff[0] | ((uint32_t)buff[1] << 8) | ((uint32_t)buff[2] << 16) | ((uint32_t)buff[3] << 24) ;
}

int32_t bufftoInt32_2bytes(const uint8_t *buff)
{
    int16_t result = (uint16_t)buff[0] | ((uint16_t)buff[1] << 8);
    return result;
}


int main()
{
    int16_t x = -5;
    int32_t y = -10;
    int16_t w = -5567;

    printf("%hd %d %d\n", bufftoInt16(&x), bufftoInt32(&y), bufftoInt32_2bytes(&w));

    return 0;
}

casting bytes to signed integers works completely different way than the unsigned shift.

answered on Stack Overflow Jun 19, 2019 by

0___________ • edited Jun 19, 2019 by

0___________

The result of an expression in C has its type determined by the types of the component operands of that expression, not by any cast you may apply to that result. As Barmar comments above, to force the type of the result you must cast one of the operands.

answered on Stack Overflow Jun 19, 2019 by

mlp

I am working on a project where I often need to interpret certain variables as signed or unsigned values and do signed operations on them.

That seems fraught. I take you to mean that you want to reinterpret objects' representations as having different types (varying only in signedness) in different situations, or perhaps that you want to convert values as if you were reinterpreting object representations. This sort of thing generally produces a mess, though you can handle it if you take sufficient care. That can be easier if you are willing to depend on details of your implementation, such as its representations of various types.

It is imperative in such matters is to know and understand all the rules for implicit conversions, both the integer promotions and the usual arithmetic conversions, and under which circumstances they apply. It is essential to understand the effect of these rules on the evaluation of your expressions -- both the type and the value of all intermediate and final results.

For example, the best you can hope for with respect to the cast in

push((int32_t)( (-1) * (temp1 - temp2) ) );

is that it is useless. If the value is not representable in that type then (it being a signed integer type) a signal may be raised, and if not, then the result is implementation-defined. If the value is representable, however, then the conversion does not change it. In any case, the result is not exempted from further conversion to the type of push()'s parameter.

For another example, the difference between version 1 and version 2 of your first example is largely which values are converted, when (but see also below). If the two indeed produce different results then it follows that the return type of pop() is different from int32_t. In that case, if you want to convert those to a different type to perform an operation on them then you must in fact do that. Your version 2 accomplishes that via assigning the pop() results to variables of the desired type, but it would be more idiomatic to perform the conversions via casts:

push((-1) * ((int32_t)pop() - (int32_t)pop()));

Beware, however, that if the results of the pop() calls depend on their order -- if they pop elements off a stack, for instance -- then you have a further problem: the relative order in which those operands are evaluated is unspecified, and you cannot safely assume that it will be consistent. For that reason, not because of typing considerations, your version 2 is preferable here.

Overall, however, if you have a stack whose elements may represent values of different types, then I would suggest making the element type a union (if the type of each element is implicit from context) or a tagged union (if elements need to carry information about their own types. For example,

union integer {
    int32_t signed;
    uint32_t unsigned;
};

union integer pop();
void push(union integer i);

union integer first = pop();
union integer second = pop();
push((union integer) { .signed = second.signed - first.signed });

answered on Stack Overflow Jun 19, 2019 by

John Bollinger • edited Jun 19, 2019 by

John Bollinger

To help you see what's happening in your code, I've included the text of the standard that explains how automatic type conversions are done (for integers), along with the section on bitwise shifting since that works a bit differently. I then step through your code to see exactly what intermediate types exist after each operation.

Relevant parts of the standard

6.3.1.1 Boolean, characters, and integers

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

6.3.1.8 Usual Arithmetic Conversions

(I'm just summarizing the relevant parts here.)

Integer promotion is done.
If they are both signed or both unsigned, they are both converted to the larger type.
If the unsigned type is larger, the signed type is converted to the unsigned type.
If the signed type can represent all values of the unsigned type, the unsigned type is converted to the signed one.
Otherwise, they are both converted to the unsigned type of the same size as the signed type.

(Basically, if you've got a OP b, the size of the type used will be the largest of int, type(a), type(b), and it will prefer types that can represent all values representable by type(a) and type(b). And finally, it favors signed types. Most of the time, that means it'll be int.)

6.5.7 Bitwise shift operators

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is $E1 x 2^{E2}$,reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and $E1 x 2^{E2}$ is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

How all that applies to your code.

I'm skipping the first example for now, since I don't know what type pop() returns. If you add that information to your question, I can address that example as well.

Let's step through what happens in this expression (note that you had an extra ( after the first cast in your version; I've removed that):

(((int32_t)argument[0] << 8) & (int32_t)0x0000ff00 | (((int32_t)argument[1]) & (int32_t)0x000000ff) )

Some of these conversions depend on the relative sizes of the types. Let INT_TYPE be the larger of int32_t and int on your system.

`((int32_t)argument[0] << 8)`

argument[0] is explicitly cast to int32_t
8 is already an int, so no conversion happens
(int32_t)argument[0] is converted to INT_TYPE.
The left shift happens and the result has type INT_TYPE.

(Note that if argument[0] could have been negative, the shift would be undefined behavior. But since it was originally unsigned, so you're safe here.)

Let a represent the result of those steps.

`a & (int32_t)0x0000ff00`

0x000ff0 is explicitly cast to int32_t.
Usual arithmetic conversions. Both sides are converted to INT_TYPE. Result is of type INT_TYPE.

Let b represent the result of those steps.

`(((int32_t)argument[1]) & (int32_t)0x000000ff)`

Both of the explicit casts happen
Usual arithmetic conversions are done. Both sides are now INT_TYPE.
Result has type INT_TYPE.

Let c represent that result.

`b | c`

Usual arithmetic conversions; no changes since they're both INT_TYPE.
Result has type INT_TYPE.

Conclusion

So none of the intermediate results are unsigned here. (Also, most of the explicit casts were unnecessary, especially if sizeof(int) >= sizeof(int32_t) on your system).

Additionally, since you start with uint8_ts, never shift more than 8 bits, and are storing all the intermediate results in types of at least 32 bits, the top 16 bits will always be 0 and the values will all be non-negative, which means that the signed and unsigned types represent all the values you could have here exactly the same.

What exactly are you observing that makes you think it's using unsigned types where it should use signed ones? Can we see example inputs and outputs along with the outputs you expected?

Edit: Based on your comment, it appears that the reason it isn't working the way you expected is not because the type is unsigned, but because you're generating the bitwise representations of 16 bit signed ints but storing them in 32 bit signed ints. Get rid of all the casts you have other than the (int32_t)argument[0] ones (and change those to (int)argument[0]. int is generally the size that the system operates on most efficiently, so your operations to use int unless you have a specific reason to use another size). Then cast the final result to int16_t.

answered on Stack Overflow Jun 19, 2019 by

Ray • edited Jun 20, 2019 by

Ray

User contributions licensed under CC BY-SA 3.0