Mixed Encoding to String

1

I have a string in VB.net that may contain something like the following:

This is a 0x000020AC symbol

This is the UTF-32 encoding for the Euro Symbol according to this article http://www.fileformat.info/info/unicode/char/20ac/index.htm

I'd like to convert this into

This is a € symbol

I've tried using UnicodeEncoding() class in VB.net (Framework 2.0, as I'm modifying a legacy application)

When I use this class to encode, and then decode I still get back the original string.

I expected that the UnicodeEncoding would recognise the already encoded part and not encode it against. But it appears to not be the case.

I'm a little lost now as to how I can convert a mixed encoded string into a normal string.

Background: When saving an Excel spreadsheet as CSV, anything outside of the ascii range gets converted to ?. So my idea is that if I can get my client to search/replace a few characters, such as the Euro symbol, into an encoded string such as 0x000020AC. Then I was hoping to convert those encoded parts back into the real symbols before I insert to a SQL database.

I've tried a function such as

Public Function Decode(ByVal s As String) As String
    Dim uni As New UnicodeEncoding()
    Dim encodedBytes As Byte() = uni.GetBytes(s)
    Dim output As String = ""

    output = uni.GetString(encodedBytes)

    Return output
End Function

Which was based on the examples on the MSDN at http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx

It could be that I have a complete mis-understanding of how this works in VB.net. In C# I can simply use escaped characters such as "\u20AC". But no such thing exists in VB.net.

vb.net
encoding
decoding
asked on Stack Overflow Aug 2, 2012 by Elarys • edited Aug 2, 2012 by Elarys

2 Answers

1

Based on advice from Heinzi I implemented a Regex.Replace method using the following code, this appear to work for my examples.

Public Function Decode(ByVal s As String) As String
 Dim output As String = ""
 Dim sRegex As String = "0x[0-9a-zA-Z]{8}"

 Dim r As Regex = New Regex(sRegex)

 Dim myEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf HexToString)

 output = r.Replace(s, myEvaluator)

 Return output
End Function

Public Function HexToString(ByVal hexString As Match) As String
 Dim uni As New UnicodeEncoding(True, True)
 Dim input As String = hexString.ToString
 input = input.Substring(2)
 input = input.TrimStart("0"c)

 Dim output As String

 Dim length As Integer = input.Length
 Dim upperBound As Integer = length \ 2
 If length Mod 2 = 0 Then
  upperBound -= 1
 Else
  input = "0" & input
 End If
 Dim bytes(upperBound) As Byte
 For i As Integer = 0 To upperBound
  bytes(i) = Convert.ToByte(input.Substring(i * 2, 2), 16)
 Next

 output = uni.GetString(bytes)

 Return output
End Function
answered on Stack Overflow Aug 2, 2012 by Elarys
0

Have you tried:

Public Function Decode(Byval Coded as string) as string
     Return StrConv(Coded, vbUnicode)
End Function

Also, your function is invalid. It takes s as an argument, does a load of stuff and then outputs the s that was put into it instead of the stuff that was processed within it.

answered on Stack Overflow Aug 2, 2012 by Pharap

User contributions licensed under CC BY-SA 3.0