I'm working on a database that has a VARBINARY(255)
column that doesn't make sense to me. Depending on the length of the value, the value is either numbers or words.
For whatever number is stored, it is a 4-byte hex string 0x00000000
, but reads left to right while the bytes read right to left. So for a number such as 255
, it is 0xFF000000
and for a number such as 745
, it is 0xE9020000
. This is the part that I do not understand, why is it stored that way instead of 0x02E9
, 0x2E9
or 0x000002E9
?
When it comes to words, each character is stored as a 4-byte hex string just like above. Something like a space is stored as 0x20000000
, but a word like Sensor
it is 0x53000000650000006E000000730000006F00000072000000
instead of just 0x53656E736F72
.
Can anyone explain to me why the data is stored in this way? Is everything represented as 4-byte strings because the numbers stored can be the full 4-bytes while text is padded with zeros for consistency? Why are the zeros padded to the right of the value? Why are the values stored with the 4th byte first and 1st byte last?
If none of this makes sense from an SQL standpoint, I suppose it is possible that the data is being provided this way from the client application which I do not have access to the source on. Could that be the case?
Lastly, I would like to create a report that includes this column, but converted to the correct numbers or words. Is there a simpler and more performant method than using substrings, trims, and recursion?
With the help of Smor in the comments above, I can now answer my own questions.
The client application provides the 4-byte strings and the database just takes them as they fit within the column's VARBINARY(255)
data type and length. Since the application is providing the values in a little-endian format, they are stored in that way within the database with the least significant byte first and the most significant byte last. Being that most values are smaller than the static 4-byte length, the values are padded with zeros to the right to fit the 4-byte requirement.
Now as to my question of the report, this is what I came up with:
CASE
WHEN LEN(ByteValue) <= 4
THEN CAST(CAST(CAST(REVERSE(ByteValue) AS VARBINARY(4)) AS INT) AS VARCHAR(12))
ELSE CAST(CONVERT(VARBINARY(255),REPLACE(CONVERT(VARCHAR(255),ByteValue,1),'000000',''),1) AS VARCHAR(100))
END AS PlainValue
In my particular case, only numbers are stored as just 4-byte or less values while words are stored as much longer values. This allows me to break the smaller values into numbers while longer values are broken down into words.
Using CASE WHEN
I can specify that only data 4-bytes or less needs the REVERSE()
function as it is the easiest way to convert the little-endian format to the big-endian format that SQL is looking for when converting from hex to integers. Due to the REVERSE()
function returning a NVARCHAR
datatype, I then have to convert that back to VARBINARY
, then to INT
, then to VARCHAR
to match the datatype of the second case datatype.
Any string longer than 4-bytes, used specifically for words, falls under the ELSE
part and allows me to strip the extra zeros from the hex value so I get just the first byte of each 4-byte long character (the only part that matters in my situation). By converting the hex string to VARCHAR
, I can then easily remove the 6 repeating zeros using the REPLACE()
function. With the zeros gone, converting the string back to VARBINARY
allows converting to VARCHAR
to be done with ease.
User contributions licensed under CC BY-SA 3.0