From Unicode 6.2:
Characters are the abstract representations of the smallest components of written language that have semantic value.
Unicode uses characters not glyphs!
A character has a code point (a number) as a representation.
Glyphs are representations of characters when they are rendered or displayed.
Fonts are collections of glyphs.
AAAA | U+0041 Latin capital letter A |
→⇒⟹⇨☞⇰➲➽ | Arrows (U+2190-U21FF) |
∏∑∉∰∫⨸⨂∞∪∩ | Mathematical Operators (U+2200-U22FF) |
Miłowski | my name |
ابجدهوزحطيكلمنسعفصقرشتثخذضظغ
ا ب ج د ه و ز ح ط ي ك ل م ن س ع ف ص ق ر ش ت ث خ ذ ض ظ غ
See the reference charts.
Each chart has:
Examples: Greek or Cuneiform Numbers and Punctuation
Many editors will let you just insert any character directly!
A few methods:
1,114,112 code points, first 65,536 is the Basic Multilingual Plane (16 bit), you really need 32 bits ...
Unicode characters are encoded
into a byte sequence:
Many systems mess this up by default — including probably everyone's operating system in this class.