MinGW64 bit field access across word-boundaries is wrong

1

I'm investigation how different compilers handle unaligned access of structure bitfields members as well as members that cross the primitive types' boundaries, and I think MinGW64 is bugged. My test program is:

#include <stdint.h>
#include <stdio.h>

/* Structure for testing element access

The crux is the ISO C99 6.7.2.1p10 item:

An implementation may allocate any addressable storage unit large enough to hold a bitfield.
If enough space remains, a bit-field that immediately follows another bit-field in a
structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
implementation-defined. The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.
*/

typedef struct _my_struct
{
    /* word 0 */
    uint32_t    first           :32;    /**< A whole word element   */
    /* word 1 */
    uint32_t    second          :8;     /**< bits 7-0               */
    uint32_t    third           :8;     /**< bits 15-8              */
    uint32_t    fourth          :8;     /**< bits 23-16             */
    uint32_t    fifth           :8;     /**< bits 31-24             */
    /* word 2 */
    uint32_t    sixth           :16;    /**< bits 15-0              */
    uint32_t    seventh         :16;    /**< bits 31-16             */
    /* word 3 */
    uint32_t    eigth           :24;    /**< bits 23-0              */
    uint32_t    ninth           :8;     /**< bits 31-24             */
    /* word 4 */
    uint32_t    tenth           :8;     /**< bits 7-0               */
    uint32_t    eleventh        :24;    /**< bits 31-8              */
    /* word 5 */
    uint32_t    twelfth         :8;     /**< bits 7-0               */
    uint32_t    thirteeneth     :16;    /**< bits 23-8              */
    uint32_t    fourteenth      :8;     /**< bits 31-24             */
    /* words 6 & 7 */
    uint32_t    fifteenth       :16;    /**< bits 15-0              */
    uint32_t    sixteenth       :8;     /**< bits 23-16             */
    uint32_t    seventeenth     :16;    /**< bits 31-24 & 7-0       */
    /* word 7 */
    uint32_t    eighteenth      :24;    /**< bits 31-8              */
    /* word 8 */
    uint32_t    nineteenth      :32;    /**< bits 31-0              */
    /* words 9 & 10 */
    uint32_t    twentieth       :16;    /**< bits 15-0              */
    uint32_t    twenty_first    :32;    /**< bits 31-16 & 15-0      */
    uint32_t    twenty_second   :16;    /**< bits 31-16             */
    /* word 11 */
    uint32_t    twenty_third    :32;    /**< bits 31-0              */
} __attribute__((packed)) my_struct;


uint32_t buf[] = {
        0x11223344, 0x55667788, 0x99AABBCC, 0x01020304, /* words 0  - 3     */
        0x05060708, 0x090A0B0C, 0x0D0E0F10, 0x12131415, /* words 4  - 7     */
        0x16171819, 0x20212324, 0x25262728, 0x29303132, /* words 8  - 11    */
        0x34353637, 0x35363738, 0x39404142, 0x43454647  /* words 12 - 15    */
};

uint32_t data[64];

int main(void)
{
    my_struct *p;

    p = (my_struct*) buf;

    data[0] = 0;
    data[1] = p->first;
    data[2] = p->second;
    data[3] = p->third;
    data[4] = p->fourth;
    data[5] = p->fifth;
    data[6] = p->sixth;
    data[7] = p->seventh;
    data[8] = p->eigth;
    data[9] = p->ninth;
    data[10] = p->tenth;
    data[11] = p->eleventh;
    data[12] = p->twelfth;
    data[13] = p->thirteeneth;
    data[14] = p->fourteenth;
    data[15] = p->fifteenth;
    data[16] = p->sixteenth;
    data[17] = p->seventeenth;
    data[18] = p->eighteenth;
    data[19] = p->nineteenth;
    data[20] = p->twentieth;
    data[21] = p->twenty_first;
    data[22] = p->twenty_second;
    data[23] = p->twenty_third;

    if( p->fifth == 0x55 )
    {
        data[0] = 0xCAFECAFE;
    }
    else
    {
        data[0] = 0xDEADBEEF;
    }

    int i;
    for (i = 0; i < 24; ++i) {
        printf("data[%d] = 0x%0x\n", i, data[i]);
    }
    return data[0];
}

And the results I found are:

| Data Member | Type    | GCC Cortex M3  | GCC mingw64   | GCC Linux     | GCC Cygwin    |
|:------------|:-------:|:---------------|:--------------|:--------------|:--------------|
| data[0]     | uint32_t| 0x0            | 0xcafecafe    | 0xcafecafe    | 0xcafecafe    |
| data[1]     | uint32_t| 0x11223344     | 0x11223344    | 0x11223344    | 0x11223344    |
| data[2]     | uint32_t| 0x88           | 0x88          | 0x88          | 0x88          |
| data[3]     | uint32_t| 0x77           | 0x77          | 0x77          | 0x77          |
| data[4]     | uint32_t| 0x66           | 0x66          | 0x66          | 0x66          |
| data[5]     | uint32_t| 0x55           | 0x55          | 0x55          | 0x55          |
| data[6]     | uint32_t| 0xbbcc         | 0xbbcc        | 0xbbcc        | 0xbbcc        |
| data[7]     | uint32_t| 0x99aa         | 0x99aa        | 0x99aa        | 0x99aa        |
| data[8]     | uint32_t| 0x20304        | 0x20304       | 0x20304       | 0x20304       |
| data[9]     | uint32_t| 0x1            | 0x1           | 0x1           | 0x1           |
| data[10]    | uint32_t| 0x8            | 0x8           | 0x8           | 0x8           |
| data[11]    | uint32_t| 0x50607        | 0x50607       | 0x50607       | 0x50607       |
| data[12]    | uint32_t| 0xc            | 0xc           | 0xc           | 0xc           |
| data[13]    | uint32_t| 0xa0b          | 0xa0b         | 0xa0b         | 0xa0b         |
| data[14]    | uint32_t| 0x9            | 0x9           | 0x9           | 0x9           |
| data[15]    | uint32_t| 0xf10          | 0xf10         | 0xf10         | 0xf10         |
| data[16]    | uint32_t| 0xe            | 0xe           | 0xe           | 0xe           |
| data[17]    | uint32_t| 0x150d         | 0x1415        | 0x150d        | 0x150d        |
| data[18]    | uint32_t| 0x121314       | 0x171819      | 0x121314      | 0x121314      |
| data[19]    | uint32_t| 0x16171819     | 0x20212324    | 0x16171819    | 0x16171819    |
| data[20]    | uint32_t| 0x2324         | 0x2728        | 0x2324        | 0x2324        |
| data[21]    | uint32_t| 0x27282021     | 0x29303132    | 0x27282021    | 0x27282021    |
| data[22]    | uint32_t| 0x2526         | 0x3637        | 0x2526        | 0x2526        |
| data[23]    | uint32_t| 0x29303132     | 0x35363738    | 0x29303132    | 0x29303132    |

GCC Cortex M3 is
arm-none-eabi-gcc (GNU MCU Eclipse ARM Embedded GCC, 32-bit) 8.2.1 20181213 (release) [gcc-8-branch revision 267074]

GCC Mingw is
gcc.exe (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 8.1.0

GCC Linux is
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)

GCC Cygwin is
gcc (GCC) 7.4.0

All GCC versions seem to correctly handle unaligned access (like my_struct.thirteeneth).

The problem is not that members who cross the word boundary (my_struct.seventeenth) are different, as the C99 standard quoted above clearly states that the behaviour is implementation-defined. The problem is that all subsequent accesses are clearly incorrect (data[17] and on) even for aligned members (my_struct.nineteenth & my_struct.twenty_third). What's going on here, is this a bug or are these valid values?

c
gcc
struct
mingw-w64
bit-fields
asked on Stack Overflow Oct 21, 2019 by Leonardo • edited Oct 22, 2019 by Leonardo

3 Answers

3

The chances that a widely used compiler like GCC has a bug is not zero but really minimal. And odds are that PEBKAS. ;-)

Anyway, I have compiled your programm with "gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0" and got the same result as you in the column "mingw64".

A finer look reveals that the compiler aligns the bitfields on 32-bit boundaries which happens to be the width of an int. This conforms perfectly to chapter 6.7.2.1 of the standard C17 which states that the "straddling" (in its words of the annex J.3.9) is implementation-defined.

The other GCC variants are not aligning the bit fields and support crossing 32-bit boundaries.

It is clearly not a bug, the values are valid. It might be worth to research the reasons and perhaps post a feature request.

Edit:

Just to clarify, this is the layout with alignment. There is nothing wrong with elements seventeenth and following:

/* 0x11223344: word 0 */
uint32_t    first           :32;
/* 0x55667788: word 1 */
uint32_t    second          :8;
uint32_t    third           :8;
uint32_t    fourth          :8;
uint32_t    fifth           :8;
/* 0x99AABBCC: word 2 */
uint32_t    sixth           :16;
uint32_t    seventh         :16;
/* 0x01020304: word 3 */
uint32_t    eigth           :24;
uint32_t    ninth           :8;
/* 0x05060708: word 4 */
uint32_t    tenth           :8;
uint32_t    eleventh        :24;
/* 0x090A0B0C: word 5 */
uint32_t    twelfth         :8;
uint32_t    thirteeneth     :16;
uint32_t    fourteenth      :8;
/* 0x0D0E0F10: words 6 */
uint32_t    fifteenth       :16;
uint32_t    sixteenth       :8;
/* 0x12131415: word 7, because "seventeenth" does not fit in the space left */
uint32_t    seventeenth     :16;
/* 0x16171819: word 8, because "eighteenth" does not fit in the space left */
uint32_t    eighteenth      :24;
/* 0x20212324: word 9, because "nineteenth" does not fit in the space left */
uint32_t    nineteenth      :32;
/* 0x25262728: words 10 */
uint32_t    twentieth       :16;
/* 0x29303132: word 11, because "twenty_first" does not fit in the space left */
uint32_t    twenty_first    :32;
/* 0x34353637: word 12 */
uint32_t    twenty_second   :16;
/* 0x35363738: word 13, because "twenty_third" does not fit in the space left */
uint32_t    twenty_third    :32;
answered on Stack Overflow Oct 22, 2019 by the busybee • edited Oct 23, 2019 by the busybee
3

It is not bugged, it lays the bitfields according to windows ABI.

According to gcc docs:

If packed is used on a structure, or if bit-fields are used, it may be that the Microsoft ABI lays out the structure differently than the way GCC normally does.

Compile mingw64 version with -mno-ms-bitfields to fix the difference. Or compile all other versions with -mms-bitfields to lay out the structure the same as mingw.

answered on Stack Overflow Oct 23, 2019 by Kamil Cuk
1

You can not rely at all, in any way, on how bit-fields are arranged in a structure.

Per 6.7.2.1 Structure and union specifiers, paragraph 11 of the C11 standard (bolding mine):

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

You even quoted that. Given that, there is no "incorrect" way for an implementation to lay out a bit-field.

So you can not rely on the size of the bit-field container.

You can not rely on whether or not a bit-field crosses units.

You can not rely on the order of bit-fields within a unit.

Yet your question assumes you can do all that, even using terms such as "correct" when you see what you expected and "clearly incorrect" to describe bit-field layouts you didn't expect.

It's not "clearly incorrect".

If you need to know where a bit is in a structure, you simply can not portably use bit-fields.

In fact, all your effort on this question is a perfect case study in why you can't rely on bit-fields.

answered on Stack Overflow Oct 22, 2019 by Andrew Henle • edited Oct 22, 2019 by Andrew Henle

User contributions licensed under CC BY-SA 3.0