Convert Unicode to UTF 32

0

How I convert U+0065 to UTF-32 format ?

U+0065
0000 0000 0110 0101

UTF-32
xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxxx

Convert U+0065 to UTF-32:

 0000 0000 0000 0000 0000 0000 0110 0101

Result in hex is 0x00000065

Is that correct ?

unicode
utf-32
asked on Stack Overflow May 7, 2015 by user4362081

1 Answer

1

Yes, it is correct.

UTF-32 is always written using 32 bits. Unicode defines codepoints up to U+10FFFF, which uses 21 bits. So a UTF-32 value is always the same as the codepoint itself.

Because U+0065 is in the U+0000..U+007F range, it is written in UTF-8 using 8 bits (01100101). In UTF-16, it is the same using 16 bits (00000000 01100101), and in UTF-32 using 32 bits (00000000 00000000 00000000 01100101).

answered on Stack Overflow May 7, 2015 by Sébastien Le Callonnec • edited May 7, 2015 by Remy Lebeau

User contributions licensed under CC BY-SA 3.0