I have a string in VB.net that may contain something like the following:
This is a 0x000020AC symbol
This is the UTF-32 encoding for the Euro Symbol according to this article http://www.fileformat.info/info/unicode/char/20ac/index.htm
I'd like to convert this into
This is a € symbol
I've tried using UnicodeEncoding() class in VB.net (Framework 2.0, as I'm modifying a legacy application)
When I use this class to encode, and then decode I still get back the original string.
I expected that the UnicodeEncoding would recognise the already encoded part and not encode it against. But it appears to not be the case.
I'm a little lost now as to how I can convert a mixed encoded string into a normal string.
Background: When saving an Excel spreadsheet as CSV, anything outside of the ascii range gets converted to ?. So my idea is that if I can get my client to search/replace a few characters, such as the Euro symbol, into an encoded string such as 0x000020AC. Then I was hoping to convert those encoded parts back into the real symbols before I insert to a SQL database.
I've tried a function such as
Public Function Decode(ByVal s As String) As String
Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(s)
Dim output As String = ""
output = uni.GetString(encodedBytes)
Return output
End Function
Which was based on the examples on the MSDN at http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx
It could be that I have a complete mis-understanding of how this works in VB.net. In C# I can simply use escaped characters such as "\u20AC". But no such thing exists in VB.net.
Based on advice from Heinzi I implemented a Regex.Replace method using the following code, this appear to work for my examples.
Public Function Decode(ByVal s As String) As String
Dim output As String = ""
Dim sRegex As String = "0x[0-9a-zA-Z]{8}"
Dim r As Regex = New Regex(sRegex)
Dim myEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf HexToString)
output = r.Replace(s, myEvaluator)
Return output
End Function
Public Function HexToString(ByVal hexString As Match) As String
Dim uni As New UnicodeEncoding(True, True)
Dim input As String = hexString.ToString
input = input.Substring(2)
input = input.TrimStart("0"c)
Dim output As String
Dim length As Integer = input.Length
Dim upperBound As Integer = length \ 2
If length Mod 2 = 0 Then
upperBound -= 1
Else
input = "0" & input
End If
Dim bytes(upperBound) As Byte
For i As Integer = 0 To upperBound
bytes(i) = Convert.ToByte(input.Substring(i * 2, 2), 16)
Next
output = uni.GetString(bytes)
Return output
End Function
Have you tried:
Public Function Decode(Byval Coded as string) as string
Return StrConv(Coded, vbUnicode)
End Function
Also, your function is invalid. It takes s as an argument, does a load of stuff and then outputs the s that was put into it instead of the stuff that was processed within it.
User contributions licensed under CC BY-SA 3.0