- All UCS characters > 0x7f (127) are stored as multi-byte characters >= 0x80
- No problems with control characters
- The first byte of a multi-byte character is always in the range 0xc0 - 0xfd and describes how long this byte-sequence is.
- European languages usually use up to two bytes per character
- Characters of other languages usually use up to three bytes per character
- The longest UTF-8 encoded character can be 6 bytes, but only 4 bytes are used