.NET Equivalent of _byteswap_ulong / OSSwapInt32 / or bswap32

4

Does anybody know if there is a .NET Framework equivalent for swapping the order of bytes in an uint?

I am trying to port some custom c hashing code that is using MSFTs _byteswap_ulong (the equivalent of Apples OSSwapInt32 or Bswap32 in *Nix world) to c#. I can write this function manually but I doubt it would take advantage of any compiler optimizations (for instance the c/c++ compiler provides intrinsicts that are hard to outperform I am hoping the runtime does the same with built in functionality). If it matters I don't care about preserving endianness.

I have tried a Generic based solution, but am not convinced this would be optimal.

BitConverter.ToUInt32(BitConverter.GetBytes(g).Reverse().ToArray<byte>(),0);

EDIT:

SO I figured out how many times this particular function gets called in a ten minute period on average (for one of our hash consumers). This function gets called 10,000,000,000 times. So I setup some micro profiling to see the performance of the c code versus the solutions provided below (and the one proposed above).

The C code runs that many operations (using the intrinsic) in approx. 1500 ms on my trusty laptop. The c# code I presented above runs in almost 2,689,581 ms. A huge difference. The c# code presented by Matthew Watson runs in almost 36000 ms. The first solution presented by Caramiriel runs in almost 115,014 ms and the second solution provided runs in almost 36000.

While none of these solutions come close to touching the speed of the intrinsic call, they are much better than my original solution (going from 44 minutes to 36 seconds for that many computations). This is perfectly acceptable for my application. Although it would be nice if the .NET compiler provided some of the same intrinsic capability that the native compilers do.

For completeness here is my C code for the microbenchmarking:

#include "stdafx.h"
#include "windows.h"

unsigned long Swap(unsigned int value)
{
    return _byteswap_uint64(value);
}

int _tmain(int argc, _TCHAR* argv[])
{
    unsigned int value = 0x01020304;
    unsigned long NUMITER = 10000000000;
    unsigned long a=0;
    unsigned long z=0;
    int throwAwayLoopCount = 5;

    for (int k = 0; k < throwAwayLoopCount; ++k)
    {
        a = GetTickCount();
        for (unsigned long  i = 0; i < NUMITER; ++i)
        {
            value = Swap(value);
        }
        z = GetTickCount();
        printf("Baseline, Cached: time is %4lld milliseconds: value%4lld\n", z-a,value);
    }

    printf("Baseline, Cached: time is %4lld milliseconds\n", z-a);

    return 0;
}

Here is the c# code for benchmarking the solutions provided:

namespace ByteSwapProfiler
{
    using System.Runtime.InteropServices;
    using System.Diagnostics;

    [StructLayout(LayoutKind.Explicit)]
    internal struct UInt32Union
    {
        [FieldOffset(0)]
        public UInt32 Value;
        [FieldOffset(0)]
        public byte Byte1;
        [FieldOffset(1)]
        public byte Byte2;
        [FieldOffset(2)]
        public byte Byte3;
        [FieldOffset(3)]
        public byte Byte4;
    }


    class Program
    {

        static uint ByteSwapNaive(uint g)
        {
            return BitConverter.ToUInt32(BitConverter.GetBytes(g).Reverse().ToArray<byte>(), 0);
        }

        static uint ByteSwapCaramiriel1(uint value)
        {
            unchecked
            {
                return ((value & 0xff000000) >> 24) |
                        ((value & 0x00ff0000) >> 8) |
                        ((value & 0x0000ff00) << 8) |
                        ((value & 0x000000ff) << 24);
            }
        }

        static uint ByteSwapCaramiriel2(UInt32Union src)
        {
            UInt32Union dest = new UInt32Union
                {
                    Byte1 = src.Byte4,
                    Byte2 = src.Byte3,
                    Byte3 = src.Byte2,
                    Byte4 = src.Byte1
                };

            return dest.Value;
        }

        static uint ByteSwapMatthewWatson(uint word)
        {
            return ((word >> 24) & 0x000000FF) | ((word >> 8) & 0x0000FF00) | ((word << 8) & 0x00FF0000) | ((word << 24) & 0xFF000000);            
        }

        static void Main(string[] args)
        {
            uint value= 0x01020304;
            UInt32Union src = new UInt32Union();
            src.Value = value;

            ulong NUMITER = 10000000000;
            uint throwAwayLoopCount = 5;
            var sw = new Stopwatch();
            string name = "Naive";
            //for (int k = 0; k < throwAwayLoopCount; ++k)
            {
                sw = Stopwatch.StartNew();
                for (ulong i = 0; i < NUMITER; ++i)
                {
                    value = ByteSwapNaive(value);
                }
                sw.Stop();
                Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.  Value:{2} \n", name, (sw.ElapsedMilliseconds).ToString("0"),value);
            }

            Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.\n", name, (sw.ElapsedMilliseconds).ToString("0"));


            name = "MatthewWatson";
            for (int k = 0; k < throwAwayLoopCount; ++k)
            {
                sw = Stopwatch.StartNew();
                for (ulong i = 0; i < NUMITER; ++i)
                {
                    value = ByteSwapMatthewWatson(value);
                }
                sw.Stop();
                Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.  Value:{2} \n", name, (sw.ElapsedMilliseconds).ToString("0"), value);
            }
            Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.\n", name, (sw.ElapsedMilliseconds).ToString("0"));

            name = "Caramiriel2";
            for (int k = 0; k < throwAwayLoopCount; ++k)
            {
                sw = Stopwatch.StartNew();
                for (ulong i = 0; i < NUMITER; ++i)
                {
                    value = ByteSwapCaramiriel2(src);
                }
                sw.Stop();
                Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.  Value:{2} \n", name, (sw.ElapsedMilliseconds).ToString("0"), value);
            }
            
            Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.\n", name, (sw.ElapsedMilliseconds).ToString("0"));

            name = "Caramiriel1";
            for (int k = 0; k < throwAwayLoopCount; ++k)
            {
                sw = Stopwatch.StartNew();
                for (ulong i = 0; i < NUMITER; ++i)
                {
                    value = ByteSwapCaramiriel1(value);
                }
                sw.Stop();
                Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.  Value:{2} \n", name, (sw.ElapsedMilliseconds).ToString("0"), value);
            }
            
            Console.Write("{0,-13}, Cached: time is {1,7} milliseconds.\n", name, (sw.ElapsedMilliseconds).ToString("0"));
        }
    }
}
c#
asked on Stack Overflow Jun 22, 2013 by Matt Johnson • edited Feb 5, 2021 by FoxDeploy

4 Answers

6

I don't think you'll get anything as fast as _byteswap_ulong, but if you use this:

public static uint SwapBytes(uint word)
{
    return ((word>>24)&0x000000FF) | ((word>>8)&0x0000FF00) | ((word<<8)&0x00FF0000) | ((word<<24)&0xFF000000);            
}

at least the JIT optimiser will probably inline it.

answered on Stack Overflow Jun 22, 2013 by Matthew Watson
5

Starting with .NET Core 2.1, BinaryPrimitives.ReverseEndianness provides an optimized software implementation for this functionality. Starting with .NET Core 3.0, it's implemented with a JIT intrinsic which gets jitted into very efficient machine code that uses the bswap instruction.

answered on Stack Overflow May 17, 2019 by scott
3

One way is to work directly on the (U)Int32-type. This gives you the least overhead, such as the state machine that is being used in Linq and the method calls that are involved.

unchecked {     
    return 
            ((value & 0xff000000) >> 24) |
            ((value & 0x00ff0000) >> 8) |
            ((value & 0x0000ff00) << 8) |
            ((value & 0x000000ff) << 24);
}

This takes each byte apart in it's corresponding place within an integer number and moves (shifting) it into the right position. Finally the moved bytes are stitched (OR-ed) together. Unchecked just suppresses exceptions that may cause over/underflowing, which are not relevant right now (saves performance since there is no checking involved).

Or the C union way, though more readable, twice as slow on my VM:

[StructLayout(LayoutKind.Explicit)]
internal struct UInt32Union
{
    [FieldOffset(0)] public UInt32 Value;
    [FieldOffset(0)] public byte Byte1;
    [FieldOffset(1)] public byte Byte2;
    [FieldOffset(2)] public byte Byte3;
    [FieldOffset(3)] public byte Byte4;
}

static UInt32 Swap( UInt32 value )
{
    UInt32Union src = new UInt32Union
    src.Value = value;

    UInt32Union dest = new UInt32Union
        {
            Byte1 = src.Byte4,
            Byte2 = src.Byte3,
            Byte3 = src.Byte2,
            Byte4 = src.Byte1
        };

    return dest.Value;
}
answered on Stack Overflow Jun 22, 2013 by Caramiriel • edited Jun 22, 2013 by Caramiriel
0

You could use IPAddress.NetworkToHostOrder method and it would do it most effectively for int16 int32 int64, actually using single CPU instruction which would get inlined most of times.

Bot on Big-endian system this method would be no-operation, but I do not know about any BE system running current dotnet.

or you could use @scott's answer and BinaryPrimitives.ReverseEndianness but that's only for net core.

answered on Stack Overflow Aug 18, 2020 by Bogdan Mart

User contributions licensed under CC BY-SA 3.0