Serializing Array of Structs to byte[] - What's Wrong with my Code?

3

I have a generic method for serializing an array of any struct type into an array of bytes using Marshal.StructureToPtr and Marshal.Copy. The full code is:

    internal static byte[] SerializeArray<T>(T[] array) where T : struct
    {
        if (array == null)
            return null;
        if (array.Length == 0)
            return null;

        int position = 0;
        int structSize = Marshal.SizeOf(typeof(T));

        byte[] rawData = new byte[structSize * array.Length];

        IntPtr buffer = Marshal.AllocHGlobal(structSize);
        foreach (T item in array)
        {
            Marshal.StructureToPtr(item, buffer, false);
            Marshal.Copy(buffer, rawData, position, structSize );
            position += structSize;
        }
        Marshal.FreeHGlobal(buffer);

        return rawData;
    }

It works flawlessly 99.99% of the time. However, for one of my Windows 7 users, with certain input data this code will predictably cause the following non-.NET exception:

The data area passed to a system call is too small. (Exception from HRESULT: 0x8007007A).

Unfortunately I do not have access to the user's machine in order to attach a debugger, and I have not been able to replicate the issue even when dealing with the exact same input data as my user. This occurs only on the one user's machine and only with certain input data, but on her machine it happens every time with that same input data, so it's definitely not random.

The application targets .NET 4.5.

Can anyone see anything wrong with this code? My only guess is there is some mismatch occurring between what Marshal.SizeOf is reporting and the actual size of the data structure, thus leading to insufficient memory being allocated for the structure.

If it matters, here is the structure being serialized when the error occurs (it's a representation of character positions resulting from OCR):

public struct CharBox
{
    internal char Character;
    internal float Left;
    internal float Top;
    internal float Right;
    internal float Bottom;
}

As you can see all the fields should be constant size all the time, so my initial allocation of a single fixed-length segment of unmanaged memory into which to serialize each struct shouldn't be a problem (should it?).

While I would welcome alternative or improved methods of doing the serialization, I'm far more interested in nailing down this particular bug. Thanks!

Update Thanks to TnTnMn's pointing out to me that char is not a blittable type, I looked for unicode characters in the input to see if they were marshaling correctly. Turns out, they are NOT.

For the CharBox { 0x2022, .15782328, .266239136, .164901689, .271627158 }, the serialization (in hex) should be:

22 20 00 00 (Character*)

6D 9C 21 3E (Left)

7F 50 88 3E (Top)

FD DB 28 3E (Right)

B7 12 8B 3E (Bottom)

(* Since I wasn't using explicit layout, it padded to four bytes; I'm now frustrated with myself for needlessly increasing the data size by 11%...)

Instead, it is serializing as:

95 00 00 00 (Character)

6D 9C 21 3E (Left)

7F 50 88 3E (Top)

FD DB 28 3E (Right)

B7 12 8B 3E (Bottom)

So it is marshaling char 0x2022 as 0x95 instead. As it happens, 0x2022 Unicode and 0x95 ANSI are both the bullet character. Thus this is not random but rather it's marshaling everything to ANSI, which as I now recall is standard procedure if you don't specify a CharSet.

Ok, so this at least confirms there is some unintended behavior going on, and further gives us a good working theory as to what conditions (namely, a unicode character in the struct) might be leading to the error.

What it does not explain is why this would raise an exception at all, let alone why it isn't raised on any machine but this one user's. As to the former, a discrepancy in the byte size of unciode vs. ANSI would, I suppose, be consistent with the error message ("The data area passed to a system call is too small"), but the unmanaged buffer - which is sized to accommodate 4 full bytes for the char, would be larger than necessary, not smaller. Why would the CLR or the OS be upset about writing only 1 byte to an area intended for 2 and large enough for 4?

As to the latter, I thought perhaps the user might be on a lower version of .NET than everyone else, which could be the case if she's not getting all the Windows 7 updates. But I just tried it out on a VM with a fresh Windows 7 install and .NET 4.5 (the lowest version the application supports) and still can't reproduce the error. I'm trying to find out exactly what .NET version she's got in case it's 4.5.1 or something. Still, this seems like a long shot.

It seems the only way to know for sure will be to change the Character member to an int (to keep the padding the same for existing data) and only cast it to char when necessary, and then see if that changes the result on the user's machine. This'll also be a good opportunity to wrap each distinct Marshal call in an exception handler as John suggested to see which, exactly, is causing the error.

The good news is this is a pretty low priority feature, so I can let it fail safely even if it continues to occur.

Will report back. Thanks all.

c#
.net
marshalling
unmanaged-memory
asked on Stack Overflow Jan 26, 2017 by Peter Moore • edited Jan 26, 2017 by Peter Moore

2 Answers

1

Well I found a solution that worked, though I still don't know why.

Here's what I changed. CharBox is now:

[StructLayout(LayoutKind.Explicit, CharSet = CharSet.Unicode)]
public struct CharBox
{
    [FieldOffset(0)]
    internal int Character;

    [FieldOffset(4)]
    internal float Left;

    [FieldOffset(8)]
    internal float Top;

    [FieldOffset(12)]
    internal float Right;

    [FieldOffset(16)]
    internal float Bottom;

    // Assists with error reporting
    public override string ToString()
    {
        return $"CharBox (Character = {this.Character}, Left = {this.Left}, Top = {this.Top}, Right = {this.Right}, Bottom = {this.Bottom})";
    }
}

And the actual method is now:

    internal static byte[] SerializeArray<T>(T[] array) where T : struct
    {
        if ( array.IsNullOrEmpty() )
            return null;            

        int position = 0;
        int structSize = Marshal.SizeOf(typeof(T));

        if (structSize < 1)
        {
            throw new Exception($"SerializeArray: invalid structSize ({structSize})");
        }

        byte[] rawData = new byte[structSize * array.Length];
        IntPtr buffer = IntPtr.Zero;

        try
        {
            buffer = Marshal.AllocHGlobal(structSize);
        }
        catch (Exception ex)
        {
            throw new Exception($"SerializeArray: Marshal.AllocHGlobal(structSize={structSize}) failed. Message: {ex.Message}");
        }

        try
        {
            int i = 0;
            int total = array.Length;
            foreach (T item in array)
            {
                try
                {
                    Marshal.StructureToPtr(item, buffer, false);
                }
                catch (Exception ex)
                {
                    throw new Exception($"SerializeArray: Marshal.StructureToPtr failed. item={item.ToString()}, index={i}/{total}. Message: {ex.Message}");
                }

                try
                {
                    Marshal.Copy(buffer, rawData, position, structSize);
                }
                catch (Exception ex)
                {
                    throw new Exception($"SerializeArray: Marshal.Copy failed. item={item.ToString()}, index={i}/{total}. Message: {ex.Message}");
                }

                i++;
                position += structSize;
            }
        }
        catch
        {
            throw;
        }
        finally
        {
            try
            {
                Marshal.FreeHGlobal(buffer);
            }
            catch (Exception ex)
            {
                throw new Exception($"Marshal.FreeHGlobal failed (buffer={buffer}. Message: {ex.Message}");
            }
        }

        return rawData;
    }

I was expecting just to get more detail on the error, but instead the user reported that it worked without any warning.

All the changes to SerializeArray were just for more detailed reporting, so the substantive changes, one or more of which were the winners, were:

  • Changing the char to an int (I would have used short but I wanted to stay compatible with existing data since this struct is used elsewhere, and previously it was using 4-byte padding).

  • Setting the struct layout to LayoutKind.Explicit and setting the explicit FieldOffsets; and

  • Specifying CharSet.Unicode in StructLayout - which admittedly probably did nothing since there are no more char's in the struct

My guess is that setting the layout to Explicit and the CharSet to Unicode would have been enough to allow Character to be a char again, but I'd rather not waste my customer's time with more trial and error since it is working. Hopefully someone else can opine as to what happened, but I'll probably post this to MSDN too in the hopes that one of the CLR gods might have some insight.

Thanks all especially TnTnMan because highlighting the issue with chars and blitting definitely motivated me trying these changes.

answered on Stack Overflow Jan 26, 2017 by Peter Moore
0

I do not see any obvious error in your existing methodology, so I have nothing to offer on that front. However since you stated:

I would welcome alternative or improved methods of doing the serialization

I would like to throw this out for your consideration. Use a MemoryMappedViewAccessor to perform the transformation from array of structures to byte array. This of course requires creating a MemoryMappedFile.

internal static byte[] SerializeArray<T>(T[] array) where T : struct
    {
    int unmananagedSize = Marshal.SizeOf(typeof(T));

    int numBytes = array.Length * unmananagedSize;
    byte[] bytes = new byte[numBytes];

    using (MemoryMappedFile mmf = MemoryMappedFile.CreateNew("fred", bytes.Length))
        {
        using (MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(0, bytes.Length, MemoryMappedFileAccess.ReadWrite))
            {

            accessor.WriteArray<T>(0, array, 0, array.Length);
            accessor.ReadArray<byte>(0, bytes, 0, bytes.Length);

            }
        }

    return bytes;
    }

internal static T[] DeSerializeArray<T>(byte[] bytes) where T : struct
    {
    int unmananagedSize = Marshal.SizeOf(typeof(T));

    int numItems = bytes.Length / unmananagedSize;
    T[] newArray = new T[numItems];

    using (MemoryMappedFile mmf = MemoryMappedFile.CreateNew("fred", bytes.Length))
        {
        using (MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(0, bytes.Length, MemoryMappedFileAccess.ReadWrite))
            {

            accessor.WriteArray<byte>(0, bytes, 0, bytes.Length);
            accessor.ReadArray<T>(0, newArray, 0, newArray.Length);

            }
        }
    return newArray;
    }

Depending on you usage, you may need to provide a mechanism for a unique name (where I used "fred") for the MemoryMappedFile.

answered on Stack Overflow Jan 26, 2017 by TnTinMn

User contributions licensed under CC BY-SA 3.0