Unicode for Programmers
16-bit formats: UTF-16 and UCS-2 (wchar_t in C, char in Java)
8-bit format: UTF-8 (char in C)
Perl currently uses UTF-8 internally, can read UTF-16, ASCII, ISO-8859-1, and UTF-8
Go to www.unicode.org and buy the book!