Reading through an char array passed as void* with pointer incrementation and later read as chars and other datatypes?

0

So to clear out misunderstandings from the title (not sure how to ask the question in the title) I want to read from a file(char array), pass it as an void* so i can read undependable of datatype by incrementing the pointer. So here's an simple example of what I want to do in C code:

char input[] = "D\0\0Ckjh\0";
char* pointer = &input[0];       //lets say 0x00000010 
char type1 = *pointer;           //should be 'D'
pointer += sizeof(char);         //0x00000020
uint16_t value1 = *(uint16_t*)pointer; //should be 0
pointer += sizeof(uint16_t);     //0x00000040
char type2 = *pointer;           //should be 'C'
pointer += sizeof(char);         //0x00000050
uint32_t value2 = *(uint32_t*)pointer; //should be 1802135552

This is just for educational purpose, so I would just like to know if it is possible or if there is a way to achieve the same goal or something alike. Also the speed of this would be nice to know. Would it be faster to just keep the array and just make bitshifting on the chars as you read them or is this actually faster?

Edit: edit on the c code and changed void* to char*;

c
arrays
pointers
types
asked on Stack Overflow Aug 23, 2017 by CLover32 • edited Aug 23, 2017 by Sourav Ghosh

3 Answers

2

This is wrong in two ways:

  1. void is an incomplete type that cannot be completed. An incomplete type is a type without a known size. In order to do pointer arithmetics, the size must be known. The same is true for dereferencing a pointer. Some compilers attribute the size of a char to void, but that's an extension you should never rely on. Incrementing a pointer to void is wrong and can't work.

  2. What you have is an array of char. Accessing this array through a pointer of a different type violates strict aliasing, you're not allowed to do that.

    That's actually not what your current code does -- looking at this line:

    uint32_t value2 = (int)*pointer; //should be 1802135552
    

    You're just converting the single byte (assuming your pointer points to char, see my first point) to an uint32_t. What you probably meant is

    uint32_t value2 = *(uint32_t *)pointer; //should be 1802135552
    

    which might do what you expect, but is technically undefined behavior.

The relevant reference for this second point is e.g. in §6.5 p7 in N1570, the latest draft for C11:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

The reasoning for this very strict rule is for example that it enables compilers to do optimizations based on the assumption that two pointers of different types (except char *) can never alias. Other reasons include alignment restrictions on some platforms.

answered on Stack Overflow Aug 23, 2017 by (unknown user) • edited Aug 23, 2017 by (unknown user)
1

UPDATE:

in the updated code in the question

   uint16_t value1 = *(uint16_t*)pointer;

exactly violates strict aliasing. It's invalid code.

For more details, read the rest of the answer.


Initial version:

Technically, you are not allowed to dereference a void pointer in first place.

Quoting C11, chapter §6.5.3.2

[...] If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. [...]

but, a void is a forever-incomplete type, so the storage size is not known, hence the dereference is not possible.

A gcc extension allows you to dereference the void pointer and perform arithmatic operation on them, considering it as alias for a char pointer, but better, do not reply on this. Please cast the pointer to either a character type or the actual type (or compatible) and then, go ahead with dereference.

That said, if you cast the pointer itself to some other type than a character type or an incompatible type with the original pointer, you'll violate strict aliasing rule.

As mentioned in chapter §6.5,

An object shall have its stored value accessed only by an lvalue expression that has one of the following types

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

and, chapter §6.3.2.3

[....] When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

answered on Stack Overflow Aug 23, 2017 by Sourav Ghosh • edited Aug 23, 2017 by Sourav Ghosh
1

Even if you fix your code to cast pointer to correct type (like int *) before dereferencing it, you might have problems with alignment. For example on some architectures you simply can not read an 4-byte int if it is not aligned to 4-byte word boundary.

A solution which would definitely work is to use something like this:

int result;
memcpy(&result, pointer, sizeof(result));
answered on Stack Overflow Aug 23, 2017 by aragaer

User contributions licensed under CC BY-SA 3.0