UTF-8
Encoding of UCS characters
6/29
  • All UCS characters > 0x7f (127) are stored as multi-byte characters >= 0x80
  • No problems with control characters
  • The first byte of a multi-byte character is always in the range 0xc0 - 0xfd and describes how long this byte-sequence is.
  • European languages usually use up to two bytes per character
  • Characters of other languages usually use up to three bytes per character
  • The longest UTF-8 encoded character can be 6 bytes, but only 4 bytes are used