You are here: Encoding > Unicode > Training
Short URL: https://scripts.sil.org/UnicodeNames
Unicode Transition Training
Unicode Names
Kent Spielmann, 2005-08-10
Contents
If you know how Unicode characters are named you can use tools like Windows Character Map to find the characters you want more easily. This page focuses on how Unicode letters are named, especially those in the Latin character set.
Letter Names
Letters are the basic constituent of all texts written in alphabetic writing systems. Letters contrast with punctuation, combining diacritics, digits (numbers), ideographs (i.e. Chinese), syllables (i.e. Korean, Ethiopic), and various symbols, signs, and shapes.
Unicode names are officially in uppercase and in English, but are not case sensitive. The names may only use the letters A to Z, the digits 0 to 9, space, and hyphen.
Regular Letter Names
Most letter names contain the following four elements in order:
- Language or writing system name
- Case, if the writing system has that distinction
- Class LETTER
- Letter name
For the Latin system the letter name is usually just the letter. In other systems the name is usually written out.
Examples:
LATIN |
CAPITAL |
LETTER |
B |
 |
0042 |
GREEK |
SMALL |
LETTER |
BETA |
 |
03B2 |
CYRILLIC |
CAPITAL |
LETTER |
BE |
 |
0411 |
ARMENIAN |
SMALL |
LETTER |
BEN |
 |
0562 |
DEVANAGARI |
|
LETTER |
BA |
 |
092C |
Unicode names for 'Bs' in different scripts
Note
Letters that are part of writing systems always have the word LETTER in their name. Do not use Unicode letter-like shapes that do not have LETTER in their name (e.g.
U+00A2 CENT SIGN) for orthographies. Also, never use the Fullwidth Letters or Halfwidth letters, which are for backward compatibility with older encodings.
Written Out Latin Letter Names
The Latin writing system includes a number of letters whose basic names are written out. Examples:
LATIN CAPITAL LETTER |
AE |
 |
00C6 |
LATIN CAPITAL LETTER |
ENG |
 |
014A |
LATIN SMALL LETTER |
EZH |
 |
0292 |
LATIN SMALL LETTER |
HENG |
 |
0267 |
LATIN SMALL LETTER |
IOTA |
 |
0269 |
Latin letters with written out names
Do not use Unicode letter-like shapes that do not have LETTER in their name (e.g.
U+00A2 CENT SIGN) for orthographies. Also, never use the Fullwidth Letters or Halfwidth letters, which are for backward compatibility with older encodings.
A few IPA letters have a phonetic description as their Unicode name. Examples:
LATIN LETTER |
BIDENTAL PERCUSSIVE |
 |
02AD |
LATIN LETTER |
BILABIAL CLICK |
 |
0298 |
LATIN LETTER |
GLOTTAL STOP |
 |
0294 |
LATIN CAPITAL LETTER |
GLOTTAL STOP |
 |
0241 |
LATIN LETTER |
PHARYNGEAL VOICED FRICATIVE |
 |
0295 |
LATIN LETTER |
VOICED LARYNGEAL SPIRANT |
 |
1D24 |
|
IPA letters with phonetic names
Case Pairs
In the Latin writing system, most letters are specified as either CAPITAL (uppercase) or SMALL (lowercase) and generally have a corresponding lower/uppercase equivalent character. This means that if you “toggle” the case of a word, most Latin letters change their case. Those letters not specified for case in their name generally do not have a corresponding upper/lowercase equivalent, and therefore do not change.
The following tables give exceptions to the rule:
U+01F7 LATIN CAPITAL LETTER WYNN |
U+01BF LATIN LETTER WYNN |
U+01A6 LATIN LETTER YR |
U+0280 LATIN LETTER SMALL CAPITAL R |
Upper/lowercase pairs where one member is not specified for case
Note
In Unicode 5.0 Capital Glottal Stop was given a lower case version. The neutral case version should be used for phonetic writing systems that do not need this distinction, such as IPA.
U+0241 LATIN CAPITAL LETTER GLOTTAL STOP |
U+0242 LATIN SMALL LETTER GLOTTAL STOP |
U+0294 LATIN LETTER GLOTTAL STOP |
Three way distiction for Glottal Stop
There are two CAPITAL letters for SENĆOŦEN (British Columbia) that do not have a corresponding lowercase equivalent. SENĆOŦEN orthography does not use lowercase letters.
U+023A LATIN CAPITAL LETTER A WITH STROKE,
U+023E LATIN CAPITAL LETTER T WITH DIAGONAL STROKE
Unicode SMALL CAPITAL letters do not have lowercase equivalents. They should only be for phonetic transcriptions, otherwise you should use character formatting.
U+0262 LATIN LETTER SMALL CAPITAL G |
IPA: Voiced uvular plosive |
U+026A LATIN LETTER SMALL CAPITAL I |
IPA: Near-close front-unrounded vowel |
U+0274 LATIN LETTER SMALL CAPITAL N |
IPA: Voiced uvular nasal |
U+0276 LATIN LETTER SMALL CAPITAL OE |
IPA: Open front-rounded vowel |
U+0280 LATIN LETTER SMALL CAPITAL R |
IPA: Voiced uvular trill |
U+0281 LATIN LETTER SMALL CAPITAL INVERTED R |
IPA: Voiced uvular fricative |
U+028F LATIN LETTER SMALL CAPITAL Y |
IPA: Near-close near-front rounded vowel |
U+0299 LATIN LETTER SMALL CAPITAL B |
IPA: Voiced bilabial trill |
U+029B LATIN LETTER SMALL CAPITAL G WITH HOOK |
IPA: Voiced uvular implosive |
U+029C LATIN LETTER SMALL CAPITAL H |
IPA: Voiceless epiglottal fricative |
U+029F LATIN LETTER SMALL CAPITAL L |
IPA: Voiced velar lateral approximate |
U+1D00–U+1D2B: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, etc. |
Uralic Phonetic Alphabet (UPA) |
Uppercase letters without lowercase equivalents
SMALL case letters without a corresponding uppercase equivalent in Unicode (besides phonetic characters):
U+0DF LATIN SMALL LETTER SHARP S |
German: Uppercase is SS (Double capital S) |
U+0138 LATIN SMALL LETTER KRA |
Greenlandic (old orthography): Uppercase is K’ (K+U+02BC MODIFIER LETTER APOSTROPHE) |
Lowercase letters without uppercase Unicode equivalents
Modified Letters
There are two basic ways letters may be modified. Letters may be modified by either or both of these ways.
- Overall change: If the basic letter is modified in an overall manner, the name of the modification is put before the letter name.
- Added feature change: If accents or other additional strokes or hooks are added to the basic letter, WITH is put after the basic letter name, followed by the names of the features.
- Both of the above changes: If both of the above modifications are applied to a letter, its Unicode name is modified accordingly.
This is easier to see than explain, so look at these examples drawn from the Latin writing system, which has the most modified letters. Lists of other modifiers follow the examples.
Overall Modifications
Overall modifications to a letter are normally indicated by a word or words that precede the letter name. Examples:
LATIN LETTER |
INVERTED |
GLOTTAL STOP |
 |
0296 |
LATIN SMALL LETTER |
REVERSED |
OPEN E |
 |
025C |
LATIN SMALL LETTER |
CLOSED REVERSED |
OPEN E |
 |
025E |
LATIN SMALL LETTER |
TURNED |
H |
 |
0265 |
LATIN LETTER |
STRETCHED |
C |
 |
0297 |
LATIN LETTER |
SMALL CAPITAL |
N |
 |
0274 |
LATIN LETTER |
SMALL CAPITAL INVERTED |
R |
 |
0281 |
Examples of overall modifications to Latin letter names
This table shows the most common overall modifications that precede letter names.
Small Capital |
LATIN LETTER SMALL CAPITAL R |
0280 |
|
LATIN LETTER SMALL CAPITAL OE |
0276 |
|
LATIN LETTER SMALL CAPITAL G WITH HOOK |
029B |
Turned |
LATIN SMALL LETTER TURNED ALPHA |
0252 |
|
LATIN SMALL LETTER TURNED T |
0287 |
|
LATIN SMALL LETTER TURNED Y |
028E |
Reversed |
LATIN CAPITAL LETTER REVERSED E |
018E |
|
LATIN SMALL LETTER REVERSED E |
0258 |
|
LATIN SMALL LETTER SQUAT REVERSED ESH |
0285 |
|
LATIN SMALL LETTER CLOSED REVERSED OPEN E |
025E |
Modifications that precede letter names
This table shows additional overall modifiers that are restricted to certain letters in the Latin script.
Barred |
o |
LATIN SMALL LETTER BARRED O |
0275 |
Closed |
Omega |
LATIN SMALL LETTER CLOSED OMEGA |
0277 |
|
Open E |
LATIN SMALL LETTER CLOSED OPEN E |
029A |
Dotless |
I |
LATIN SMALL LETTER DOTLESS I |
0131 |
|
J |
LATIN SMALL LETTER DOTLESS J WITH STROKE |
025F |
Inverted |
Glottal Stop |
LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE |
01BE |
Long |
S |
LATIN SMALL LETTER LONG S |
017F |
Open |
O |
LATIN CAPITAL LETTER OPEN O |
0186 |
|
o |
LATIN SMALL LETTER OPEN O |
0254 |
|
E |
LATIN CAPITAL LETTER OPEN E |
0190 |
|
e |
LATIN SMALL LETTER OPEN E |
025B |
Stretched |
C |
LATIN LETTER STRETCHED C |
0297 |
Sharp |
S |
LATIN SMALL LETTER SHARP S |
00DF |
Script |
g |
LATIN SMALL LETTER SCRIPT G |
0261 |
Subscript |
a |
LATIN SUBSCRIPT SMALL LETTER A |
2090 |
|
e |
LATIN SUBSCRIPT SMALL LETTER E |
2091 |
|
o |
LATIN SUBSCRIPT SMALL LETTER O |
2092 |
|
x |
LATIN SUBSCRIPT SMALL LETTER X |
2093 |
|
schwa |
LATIN SUBSCRIPT SMALL LETTER SCHWA |
2094 |
Superscript |
i |
SUPERSCRIPT LATIN SMALL LETTER I |
2071 |
|
n |
SUPERSCRIPT LATIN SMALL LETTER N |
207F |
Modifications restricted to certain letters
Added Feature Modifications
Accents, strokes or hooks added to the basic letter, are normally indicated by adding after the letter name WITH followed by the name of the feature or features.
Features that follow letter names (preceded by WITH)
Acute |
LATIN CAPITAL LETTER E WITH ACUTE |
|
Double |
LATIN CAPITAL LETTER O WITH DOUBLE ACUTE |
|
Bar |
LATIN SMALL LETTER U BAR |
See also Combining Stroke |
Top |
LATIN CAPITAL LETTER B WITH TOPBAR |
|
Belt |
LATIN SMALL LETTER L WITH BELT |
|
Breve |
LATIN SMALL LETTER E WITH BREVE |
|
Below |
LATIN CAPITAL LETTER H WITH BREVE BELOW |
|
Inverted |
LATIN CAPITAL LETTER E WITH INVERTED BREVE |
|
Caron |
LATIN CAPITAL LETTER C WITH CARON |
|
Cedilla |
LATIN SMALL LETTER C WITH CEDILLA |
|
Circumflex |
LATIN SMALL LETTER O WITH CIRCUMFLEX |
|
Below |
LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW |
|
Circle See ring |
Comma Below |
LATIN CAPITAL LETTER S WITH COMMA BELOW |
|
Curl |
LATIN SMALL LETTER C WITH CURL |
Compare with Esh Loop |
Diaeresis |
LATIN SMALL LETTER U WITH DIAERESIS |
|
Dot |
Above |
LATIN SMALL LETTER G WITH DOT ABOVE |
|
Below |
LATIN SMALL LETTER D WITH DOT BELOW |
|
Grave |
LATIN SMALL LETTER E WITH GRAVE |
|
Double |
LATIN CAPITAL LETTER G WITH CEDILLA |
|
Hacek See caron |
Hat See circumflex |
Hook |
LATIN CAPITAL LETTER B WITH HOOK |
|
Above |
LATIN CAPITAL LETTER A WITH HOOK ABOVE |
|
Retroflex |
LATIN SMALL LETTER L WITH RETROFLEX HOOK |
|
Palatal |
LATIN SMALL LETTER T WITH PALATAL HOOK |
|
Fishhook |
LATIN SMALL LETTER R WITH FISHHOOK |
|
Left |
LATIN SMALL LETTER N WITH LEFT HOOK |
|
Horn |
LATIN CAPITAL LETTER O WITH HORN |
|
Long Leg |
LATIN SMALL LETTER R WITH LONG LEG |
|
Right |
LATIN SMALL LETTER N WITH LONG RIGHT LEG |
|
Line See also bar and stroke |
Line Below |
LATIN SMALL LETTER N WITH LINE BELOW |
|
Macron |
LATIN SMALL LETTER E WITH MACRON |
|
Ogonek |
LATIN SMALL LETTER I WITH OGONEK |
Used in Polish as well as some Americanist phonetic systems for nasalized |
Ring |
Above |
LATIN CAPITAL LETTER A WITH RING ABOVE |
|
Below |
LATIN CAPITAL LETTER A WITH RING BELOW |
|
Right Half |
LATIN SMALL LETTER A WITH RIGHT HALF RING |
|
Slash See stroke |
Stroke |
LATIN CAPITAL LETTER D WITH STROKE,
LATIN CAPITAL LETTER L WITH STROKE |
See also bar |
Vertical |
CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE |
|
Diagonal |
LATIN CAPITAL LETTER T WITH DIAGONAL STROKE |
See also Combining Solidus Overlay |
Tail |
LATIN SMALL LETTER EZH WITH TAIL |
|
Crossed |
LATIN SMALL LETTER J WITH CROSSED-TAIL |
See also curl |
Swash |
LATIN SMALL LETTER Z WITH SWASH TAIL |
|
Tilde |
LATIN SMALL LETTER I WITH TILDE |
|
Below |
LATIN SMALL LETTER E WITH TILDE BELOW |
|
Middle |
LATIN SMALL LETTER L WITH MIDDLE TILDE |
|
Umlaut See diaeresis |
Features that follow letter names
Exceptions
Some older Unicode names are not consistent with the current naming practice. For example:
Barred |
U+0275 LATIN SMALL LETTER BARRED O |
should be O WITH HORIZONTAL STROKE c.f U+00F8 LATIN SMALL LETTER O WITH STROKE |
Bar |
U+0289 LATIN SMALL LETTER U BAR |
should be U WITH STROKE |
Loop |
U+01AA LATIN LETTER REVERSED ESH LOOP |
should be ESH WITH LOOP |
Reversed |
U+01B8 LATIN CAPITAL LETTER EZH REVERSED |
should be REVERSED EZH |
Exceptions to Unicode naming conventions
Modifier Letters
Modifier letters are small raised or lowered letters and other symbols that generally do not stand alone and usually to modify the preceding regular letter or base character: 
, 
, 
. In some cases modifier letters do represent a separate sound, such as 
MODIFIER LETTER APOSTROPHE when used as glottal stop. Also sometimes they are associated the following letter(s), such as when marking stress.
Modifier letters behave differently from regular letters and symbols.
- Modifier letters that have letter shapes do not have case pairs. Thus, if you “toggle” the case of a word, modifier letters will not change case as would superscripted Latin letters.
- Modifier letters that look like symbols or punctuation are not considered to be word break characters when wrapping lines of text or selecting words.
Modifier letters should be used for the raised letters and symbols used in IPA and other phonetic transcription systems. They can be thought of as spacing diacritics, in contrast to combining diacritics, which are non-spacing. Also, use
MODIFIER LETTER APOSTROPHE for an apostrophe that represents a glottal stop in a regular orthography.
 MODIFIER LETTER SMALL H |
aspirated |
 MODIFIER LETTER SMALL H WITH HOOK |
breathy voiced release |
 MODIFIER LETTER SMALL J |
IPA: palatalized |
 MODIFIER LETTER SMALL W |
labialized |
 MODIFIER LETTER SMALL Y |
Amer: palatalized |
 MODIFIER LETTER SMALL GAMMA |
IPA: velarized |
 MODIFIER LETTER SMALL L |
IPA: lateral release |
 MODIFIER LETTER SMALL REVERSED GLOTTAL STOP |
IPA:pharyngealized
Amer: implosive |
 MODIFIER LETTER PRIME |
Amer: stress, secondary stress |
 MODIFIER LETTER DOUBLE PRIME |
Amer: primary stress (when secondary stress is marked) |
 MODIFIER LETTER APOSTROPHE |
IPA: ejective, Orthographic: glottal stop |
 MODIFIER LETTER VERTICAL LINE |
IPA: primary stress |
 MODIFIER LETTER LOW VERTICAL LINE |
IPA: secondary stress |
 MODIFIER LETTER TRIANGULAR COLON |
IPA: long |
 MODIFIER LETTER HALF TRIANGULAR COLON |
IPA: half-long |
Some commonly used modifier letters
Unicode versions 4.0 and 4.1 added nearly 100 modifier letters intended for representing secondary articulations, such as diphthongs, releases, glides, epenthetic sounds, etc. check out the characters at U+1D43..U+1D61 and U+1D9B..U+1DBF in an SIL font like Doulos SIL.
Tone Bars
Tone bars are modifier letters used in phonetic transcriptions of tonal languages, particularly in Asia. There are five tone bar letters
U+02E5 to
U+02E9 which individually display as horizontal tonebars with a vertical staff on the right. Tone Bars have special properties: When using a smart font like Doulos SIL, two or three sequential tone bar letters will combine to create a contour.
Combining characters
Combining Unicode characters are non-spacing characters of two basic types:
- Those that rest above or below the preceding character (overstrike and understrike) for example accents
- Those that are actually superimposed upon or connect to the preceding character (overlay and combining hooks)
Overstrike and understrike diacritics can be used to create letter-diacritic combinations that are not in Unicode. They may also be used as an alternate way of encoding precomposed letter-diacritic combinations.
Combining overlay and hook characters should be avoided since they often do not combine well with base characters and cause other typographical problems. Unicode now has a policy of approving precomposed versions of all necessary overlay and hook combination characters. Compare:
Composite overstrike |
+ U+0301 COMBINING ACUTE ACCENT |
á |
Precomposed |
U+00E1 LATIN SMALL LETTER A WITH ACUTE |
á |
Composite overlay |
+ U+0335 COMBINING SHORT STROKE OVERLAY |
h̵ |
Precomposed |
U+0127 LATIN SMALL LETTER H WITH STROKE |
ħ |
Overstrike and understrike diacritics are useful for phonetic transcriptions
U+0300 COMBINING GRAVE ACCENT |
high tone |
U+0309 COMBINING HOOK ABOVE |
Amer: laryngealized |
U+0318 COMBINING LEFT TACK BELOW |
IPA: advanced |
U+0325 COMBINING RING BELOW |
IPA: voiceless |
U+0328 COMBINING OGONEK |
Amer: nasal |
Combining overstrike and understrike characters
The Unicode standard has deprecated overlay and palatalized and retroflex hook diacritics. Composite characters that use them are not considered equivalent to precomposed characters. Always use precomposed versions if possible. Precomposed Unicode characters are available for all common and many rare combinations. If you cannot find the precomposed version you need, contact NRSI to see if it can be added to Unicode.
U+0321 COMBINING PALATALIZED HOOK BELOW |
U+01AB LATIN SMALL LETTER T WITH PALATAL HOOK |
IPA: palatalized superseded by U+02B2 MODIFIER LETTER SMALL J |
U+0322 COMBINING RETROFLEX HOOK BELOW |
U+0290 LATIN SMALL LETTER Z WITH RETROFLEX HOOK |
IPA: retroflex superseded by U+02DE MODIFIER LETTER RHOTIC HOOK |
U+0334 COMBINING TILDE OVERLAY |
U+026B LATIN SMALL LETTER L WITH MIDDLE TILDE |
IPA: velarized or pharyngealized |
U+0335 COMBINING SHORT STROKE OVERLAY |
U+0289 LATIN SMALL LETTER U BAR |
Amer: fricative or central vocoid, Some orthographies |
U+0336 COMBINING LONG STROKE OVERLAY |
U+01E5 LATIN SMALL LETTER G WITH STROKE |
Amer: fricative or central vocoid, Some orthographies |
U+0337 COMBINING SHORT SOLIDUS OVERLAY |
U+023C LATIN SMALL LETTER C WITH STROKE |
Obs: fricative, Some orthographies |
U+0338 COMBINING LONG SOLIDUS OVERLAY |
U+023E LATIN CAPITAL LETTER T WITH DIAGONAL STROKE |
Obs: fricative, Some orthographies |
Combining overlay and combining hook characters
Combined letter forms
Letters that are combinations of two letters may be named
- by simply combining the two letter names
- by combining the two letter names and adding DIGRAPH to the name, or
- as LIGATURES instead of LETTERS.
Examples:
A E |
LATIN CAPITAL LETTER AE |
|
d b |
LATIN SMALL LETTER DB DIGRAPH |
|
O E |
LATIN CAPITAL LIGATURE OE LATIN SMALL LIGATURE OE |
These are the only two Unicode characters with LIGATURE in their name that should be used as letters |
Unicode two letter combinations
Most Unicode digraphs are two letters that touch but are otherwise unmodified. However, in some fonts the letters of some digraphs may not touch.
Most Unicode ligatures (with exception of the OE and oe Ligatures used as phonetic letters) are only for backward compatibility and should not be used.
Abbreviations
- IPA International Phonetic Alphabet. International standard for writing languages phonetically.
- Amer Americanist phonetic writing system. Popularized by American linguists in the 50s and 60s. Designed to use characters that could be written using a typewriter.
- UPA Uralic Phonetic Alphabet. Finno-Ugric phonetic transcription system that has been used primarily in Finland since 1903.
Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.
"[libcat]", Sat, Sep 24, 2005 19:32 (EDT)
Under 'Exceptions':
where you have "LATIN CAPITAL LETTER EZH REVERSED should be REVERSED ESH", the 'should be' should read "REVERSED EZH", not "ESH".
kent_spielmann, Mon, Sep 26, 2005 14:23 (EDT) [modified by martinpk on Mon, Sep 26, 2005 14:25 (EDT)]
Re: REVERSED ESH/EZH
Thank you for your careful attention to detail. The correction has been made.
© 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Read our Privacy Policy. Contact us here.