This is an archive of the original scripts.sil.org site, preserved as a historical reference. Some of the content is outdated. Please consult our other sites for more current information: software.sil.org, ScriptSource, FDBP, and silfontdev



Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE | PRIVACY POLICY

You are here: Encoding > Unicode > Training
Short URL: https://scripts.sil.org/UnicodeNames

Unicode Transition Training

Unicode Names

Kent Spielmann, 2005-08-10

Contents

If you know how Unicode characters are named you can use tools like Windows Character Map to find the characters you want more easily. This page focuses on how Unicode letters are named, especially those in the Latin character set.

Letter Names

Letters are the basic constituent of all texts written in alphabetic writing systems. Letters contrast with punctuation, combining diacritics, digits (numbers), ideographs (i.e. Chinese), syllables (i.e. Korean, Ethiopic), and various symbols, signs, and shapes.

Unicode names are officially in uppercase and in English, but are not case sensitive. The names may only use the letters A to Z, the digits 0 to 9, space, and hyphen.

Regular Letter Names

Most letter names contain the following four elements in order:

  • Language or writing system name
  • Case, if the writing system has that distinction
  • Class LETTER
  • Letter name

For the Latin system the letter name is usually just the letter. In other systems the name is usually written out.

Examples:

LanguageCaseClassLetter NameGlyphCode point
LATIN CAPITAL LETTER B 0042
GREEK SMALL LETTER BETA 03B2
CYRILLIC CAPITAL LETTER BE 0411
ARMENIAN SMALL LETTER BEN 0562
DEVANAGARI   LETTER BA 092C

Unicode names for 'Bs' in different scripts

Note

Letters that are part of writing systems always have the word LETTER in their name. Do not use Unicode letter-like shapes that do not have LETTER in their name (e.g.  U+00A2 CENT SIGN) for orthographies. Also, never use the Fullwidth Letters or Halfwidth letters, which are for backward compatibility with older encodings.

Written Out Latin Letter Names

The Latin writing system includes a number of letters whose basic names are written out. Examples:

Language Case ClassLetter NameGlyphCode point
LATIN CAPITAL LETTER AE 00C6
LATIN CAPITAL LETTER ENG 014A
LATIN SMALL LETTER EZH 0292
LATIN SMALL LETTER HENG 0267
LATIN SMALL LETTER IOTA 0269

Latin letters with written out names

Do not use Unicode letter-like shapes that do not have LETTER in their name (e.g.  U+00A2 CENT SIGN) for orthographies. Also, never use the Fullwidth Letters or Halfwidth letters, which are for backward compatibility with older encodings.

A few IPA letters have a phonetic description as their Unicode name. Examples:

Language (case) ClassLetter NameGlyphCode point
LATIN LETTER BIDENTAL PERCUSSIVE 02AD
LATIN LETTER BILABIAL CLICK 0298
LATIN LETTER GLOTTAL STOP 0294
LATIN CAPITAL LETTER GLOTTAL STOP 0241
LATIN LETTER PHARYNGEAL VOICED FRICATIVE 0295
LATIN LETTER VOICED LARYNGEAL SPIRANT 1D24  

IPA letters with phonetic names

Case Pairs

In the Latin writing system, most letters are specified as either CAPITAL (uppercase) or SMALL (lowercase) and generally have a corresponding lower/uppercase equivalent character. This means that if you “toggle” the case of a word, most Latin letters change their case. Those letters not specified for case in their name generally do not have a corresponding upper/lowercase equivalent, and therefore do not change.

The following tables give exceptions to the rule:

Upper case Lower Case
 U+01F7 LATIN CAPITAL LETTER WYNN  U+01BF LATIN LETTER WYNN
 U+01A6 LATIN LETTER YR  U+0280 LATIN LETTER SMALL CAPITAL R

Upper/lowercase pairs where one member is not specified for case

Note

In Unicode 5.0 Capital Glottal Stop was given a lower case version. The neutral case version should be used for phonetic writing systems that do not need this distinction, such as IPA.

Upper case Lower CaseNeutral case
 U+0241 LATIN CAPITAL LETTER GLOTTAL STOP  U+0242 LATIN SMALL LETTER GLOTTAL STOP  U+0294 LATIN LETTER GLOTTAL STOP

Three way distiction for Glottal Stop

There are two CAPITAL letters for SENĆOŦEN (British Columbia) that do not have a corresponding lowercase equivalent. SENĆOŦEN orthography does not use lowercase letters.

  •  U+023A LATIN CAPITAL LETTER A WITH STROKE,
  •  U+023E LATIN CAPITAL LETTER T WITH DIAGONAL STROKE

Unicode SMALL CAPITAL letters do not have lowercase equivalents. They should only be for phonetic transcriptions, otherwise you should use character formatting.

Characters Comment
U+0262  LATIN LETTER SMALL CAPITAL G IPA: Voiced uvular plosive
U+026A  LATIN LETTER SMALL CAPITAL I IPA: Near-close front-unrounded vowel
U+0274  LATIN LETTER SMALL CAPITAL N IPA: Voiced uvular nasal
U+0276  LATIN LETTER SMALL CAPITAL OE IPA: Open front-rounded vowel
U+0280  LATIN LETTER SMALL CAPITAL R IPA: Voiced uvular trill
U+0281  LATIN LETTER SMALL CAPITAL INVERTED R IPA: Voiced uvular fricative
U+028F  LATIN LETTER SMALL CAPITAL Y IPA: Near-close near-front rounded vowel
U+0299  LATIN LETTER SMALL CAPITAL B IPA: Voiced bilabial trill
U+029B  LATIN LETTER SMALL CAPITAL G WITH HOOK IPA: Voiced uvular implosive
U+029C  LATIN LETTER SMALL CAPITAL H IPA: Voiceless epiglottal fricative
U+029F  LATIN LETTER SMALL CAPITAL L IPA: Voiced velar lateral approximate
U+1D00U+1D2B: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , etc. Uralic Phonetic Alphabet (UPA)

Uppercase letters without lowercase equivalents

SMALL case letters without a corresponding uppercase equivalent in Unicode (besides phonetic characters):

Character Uppercase
 U+0DF LATIN SMALL LETTER SHARP S German: Uppercase is SS (Double capital S)
 U+0138 LATIN SMALL LETTER KRA Greenlandic (old orthography): Uppercase is K’ (K+U+02BC MODIFIER LETTER APOSTROPHE)

Lowercase letters without uppercase Unicode equivalents

Modified Letters

There are two basic ways letters may be modified. Letters may be modified by either or both of these ways.

  • Overall change: If the basic letter is modified in an overall manner, the name of the modification is put before the letter name.
  • Added feature change: If accents or other additional strokes or hooks are added to the basic letter, WITH is put after the basic letter name, followed by the names of the features.
  • Both of the above changes: If both of the above modifications are applied to a letter, its Unicode name is modified accordingly.

This is easier to see than explain, so look at these examples drawn from the Latin writing system, which has the most modified letters. Lists of other modifiers follow the examples.

Overall Modifications

Overall modifications to a letter are normally indicated by a word or words that precede the letter name. Examples:

Language (Case) ClassOverall ModificationLetter NameGlyphCode point
LATIN LETTER INVERTED GLOTTAL STOP 0296
LATIN SMALL LETTER REVERSED OPEN E 025C
LATIN SMALL LETTER CLOSED REVERSED OPEN E 025E
LATIN SMALL LETTER TURNED H 0265
LATIN LETTER STRETCHED C 0297
LATIN LETTER SMALL CAPITAL N 0274
LATIN LETTER SMALL CAPITAL INVERTED R 0281

Examples of overall modifications to Latin letter names

This table shows the most common overall modifications that precede letter names.

ModificationExamplesCode point
Small Capital  LATIN LETTER SMALL CAPITAL R 0280
 LATIN LETTER SMALL CAPITAL OE 0276
 LATIN LETTER SMALL CAPITAL G WITH HOOK 029B
Turned  LATIN SMALL LETTER TURNED ALPHA 0252
 LATIN SMALL LETTER TURNED T 0287
 LATIN SMALL LETTER TURNED Y 028E
Reversed  LATIN CAPITAL LETTER REVERSED E 018E
 LATIN SMALL LETTER REVERSED E 0258
 LATIN SMALL LETTER SQUAT REVERSED ESH 0285
 LATIN SMALL LETTER CLOSED REVERSED OPEN E 025E

Modifications that precede letter names

This table shows additional overall modifiers that are restricted to certain letters in the Latin script.

TermUsed withGlyphCode point
Barred o  LATIN SMALL LETTER BARRED O 0275
Closed Omega  LATIN SMALL LETTER CLOSED OMEGA 0277
Open E  LATIN SMALL LETTER CLOSED OPEN E 029A
Dotless I  LATIN SMALL LETTER DOTLESS I 0131
J  LATIN SMALL LETTER DOTLESS J WITH STROKE 025F
Inverted Glottal Stop  LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE 01BE
Long S  LATIN SMALL LETTER LONG S 017F
Open O  LATIN CAPITAL LETTER OPEN O 0186
o  LATIN SMALL LETTER OPEN O 0254
E  LATIN CAPITAL LETTER OPEN E 0190
e  LATIN SMALL LETTER OPEN E 025B
Stretched C  LATIN LETTER STRETCHED C 0297
Sharp S  LATIN SMALL LETTER SHARP S 00DF
Script g  LATIN SMALL LETTER SCRIPT G 0261
Subscript a LATIN SUBSCRIPT SMALL LETTER A 2090
e LATIN SUBSCRIPT SMALL LETTER E 2091
o LATIN SUBSCRIPT SMALL LETTER O 2092
x LATIN SUBSCRIPT SMALL LETTER X 2093
schwa LATIN SUBSCRIPT SMALL LETTER SCHWA 2094
Superscript i  SUPERSCRIPT LATIN SMALL LETTER I 2071
n  SUPERSCRIPT LATIN SMALL LETTER N 207F

Modifications restricted to certain letters

Added Feature Modifications

Accents, strokes or hooks added to the basic letter, are normally indicated by adding after the letter name WITH followed by the name of the feature or features.

Features that follow letter names (preceded by WITH)

Name ExampleRelated characters
Acute  LATIN CAPITAL LETTER E WITH ACUTE  
Double  LATIN CAPITAL LETTER O WITH DOUBLE ACUTE  
Bar  LATIN SMALL LETTER U BAR See also Combining Stroke
Top  LATIN CAPITAL LETTER B WITH TOPBAR  
Belt  LATIN SMALL LETTER L WITH BELT  
Breve  LATIN SMALL LETTER E WITH BREVE  
Below  LATIN CAPITAL LETTER H WITH BREVE BELOW  
Inverted  LATIN CAPITAL LETTER E WITH INVERTED BREVE  
Caron  LATIN CAPITAL LETTER C WITH CARON  
Cedilla  LATIN SMALL LETTER C WITH CEDILLA  
Circumflex  LATIN SMALL LETTER O WITH CIRCUMFLEX  
Below  LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW  
Circle See ring
Comma Below  LATIN CAPITAL LETTER S WITH COMMA BELOW  
Curl  LATIN SMALL LETTER C WITH CURL Compare with Esh Loop
Diaeresis  LATIN SMALL LETTER U WITH DIAERESIS  
Dot
Above  LATIN SMALL LETTER G WITH DOT ABOVE  
Below  LATIN SMALL LETTER D WITH DOT BELOW  
Grave  LATIN SMALL LETTER E WITH GRAVE  
Double  LATIN CAPITAL LETTER G WITH CEDILLA  
Hacek See caron
Hat See circumflex
Hook  LATIN CAPITAL LETTER B WITH HOOK  
Above  LATIN CAPITAL LETTER A WITH HOOK ABOVE  
Retroflex  LATIN SMALL LETTER L WITH RETROFLEX HOOK  
Palatal  LATIN SMALL LETTER T WITH PALATAL HOOK  
Fishhook  LATIN SMALL LETTER R WITH FISHHOOK  
Left  LATIN SMALL LETTER N WITH LEFT HOOK  
Horn  LATIN CAPITAL LETTER O WITH HORN  
Long Leg  LATIN SMALL LETTER R WITH LONG LEG  
Right  LATIN SMALL LETTER N WITH LONG RIGHT LEG  
Line See also bar and stroke
Line Below  LATIN SMALL LETTER N WITH LINE BELOW  
Macron  LATIN SMALL LETTER E WITH MACRON  
Ogonek  LATIN SMALL LETTER I WITH OGONEK Used in Polish as well as some Americanist phonetic systems for nasalized
Ring
Above  LATIN CAPITAL LETTER A WITH RING ABOVE  
Below  LATIN CAPITAL LETTER A WITH RING BELOW  
Right Half  LATIN SMALL LETTER A WITH RIGHT HALF RING  
Slash See stroke
Stroke  LATIN CAPITAL LETTER D WITH STROKE,  LATIN CAPITAL LETTER L WITH STROKE See also bar
Vertical  CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE  
Diagonal  LATIN CAPITAL LETTER T WITH DIAGONAL STROKE See also Combining Solidus Overlay
Tail  LATIN SMALL LETTER EZH WITH TAIL  
Crossed  LATIN SMALL LETTER J WITH CROSSED-TAIL See also curl
Swash  LATIN SMALL LETTER Z WITH SWASH TAIL  
Tilde  LATIN SMALL LETTER I WITH TILDE  
Below  LATIN SMALL LETTER E WITH TILDE BELOW  
Middle  LATIN SMALL LETTER L WITH MIDDLE TILDE  
Umlaut See diaeresis

Features that follow letter names

Exceptions

Some older Unicode names are not consistent with the current naming practice. For example:

ModificationNameComment
Barred  U+0275 LATIN SMALL LETTER BARRED O should be O WITH HORIZONTAL STROKE c.f  U+00F8 LATIN SMALL LETTER O WITH STROKE
Bar  U+0289 LATIN SMALL LETTER U BAR should be U WITH STROKE
Loop  U+01AA LATIN LETTER REVERSED ESH LOOP should be ESH WITH LOOP
Reversed  U+01B8 LATIN CAPITAL LETTER EZH REVERSED should be REVERSED EZH

Exceptions to Unicode naming conventions

Modifier Letters

Modifier letters are small raised or lowered letters and other symbols that generally do not stand alone and usually to modify the preceding regular letter or base character: , , .1 In some cases modifier letters do represent a separate sound, such as  MODIFIER LETTER APOSTROPHE when used as glottal stop. Also sometimes they are associated the following letter(s), such as when marking stress.

Modifier letters behave differently from regular letters and symbols.

  • Modifier letters that have letter shapes do not have case pairs. Thus, if you “toggle” the case of a word, modifier letters will not change case as would superscripted Latin letters.
  • Modifier letters that look like symbols or punctuation are not considered to be word break characters when wrapping lines of text or selecting words.2

Modifier letters should be used for the raised letters and symbols used in IPA and other phonetic transcription systems. They can be thought of as spacing diacritics, in contrast to combining diacritics, which are non-spacing. Also, use  MODIFIER LETTER APOSTROPHE for an apostrophe that represents a glottal stop in a regular orthography.

Name Use
 MODIFIER LETTER SMALL H aspirated
 MODIFIER LETTER SMALL H WITH HOOK breathy voiced release
 MODIFIER LETTER SMALL J IPA: palatalized
 MODIFIER LETTER SMALL W labialized
 MODIFIER LETTER SMALL Y Amer: palatalized
 MODIFIER LETTER SMALL GAMMA IPA: velarized
 MODIFIER LETTER SMALL L IPA: lateral release
 MODIFIER LETTER SMALL REVERSED GLOTTAL STOP IPA:pharyngealized Amer: implosive
MODIFIER LETTER PRIME Amer: stress, secondary stress
MODIFIER LETTER DOUBLE PRIME Amer: primary stress (when secondary stress is marked)
 MODIFIER LETTER APOSTROPHE IPA: ejective, Orthographic: glottal stop
MODIFIER LETTER VERTICAL LINE IPA: primary stress
MODIFIER LETTER LOW VERTICAL LINE IPA: secondary stress
 MODIFIER LETTER TRIANGULAR COLON IPA: long
 MODIFIER LETTER HALF TRIANGULAR COLON IPA: half-long

Some commonly used modifier letters

Unicode versions 4.0 and 4.1 added nearly 100 modifier letters intended for representing secondary articulations, such as diphthongs, releases, glides, epenthetic sounds, etc. check out the characters at U+1D43..U+1D61 and U+1D9B..U+1DBF in an SIL font like Doulos SIL.

Tone Bars

Tone bars are modifier letters used in phonetic transcriptions of tonal languages, particularly in Asia. There are five tone bar letters  U+02E5 to  U+02E9 which individually display as horizontal tonebars with a vertical staff on the right. Tone Bars have special properties: When using a smart font like Doulos SIL, two or three sequential tone bar letters will combine to create a contour.

Smart font behavior of Doulos SIL



Combining characters

Combining Unicode characters are non-spacing characters of two basic types:

  • Those that rest above or below the preceding character (overstrike and understrike) for example accents
  • Those that are actually superimposed upon or connect to the preceding character (overlay and combining hooks)

Overstrike and understrike diacritics can be used to create letter-diacritic combinations that are not in Unicode. They may also be used as an alternate way of encoding precomposed letter-diacritic combinations.

Combining overlay and hook characters should be avoided since they often do not combine well with base characters and cause other typographical problems. Unicode now has a policy of approving precomposed versions of all necessary overlay and hook combination characters. Compare:

Character typecode pointsBrowser rendering
Composite overstrike +  U+0301 COMBINING ACUTE ACCENT á
Precomposed  U+00E1 LATIN SMALL LETTER A WITH ACUTE á
Composite overlay +  U+0335 COMBINING SHORT STROKE OVERLAY
Precomposed  U+0127 LATIN SMALL LETTER H WITH STROKE ħ

Overstrike and understrike diacritics are useful for phonetic transcriptions

ExampleUses
 U+0300 COMBINING GRAVE ACCENT high tone
 U+0309 COMBINING HOOK ABOVE Amer: laryngealized
 U+0318 COMBINING LEFT TACK BELOW IPA: advanced
 U+0325 COMBINING RING BELOW IPA: voiceless
 U+0328 COMBINING OGONEK Amer: nasal

Combining overstrike and understrike characters

The Unicode standard has deprecated overlay and palatalized and retroflex hook diacritics. Composite characters that use them are not considered equivalent to precomposed characters. Always use precomposed versions if possible. Precomposed Unicode characters are available for all common and many rare combinations. If you cannot find the precomposed version you need, contact NRSI to see if it can be added to Unicode.

Deprecated combining characters Precomposed character exampleUse
 U+0321 COMBINING PALATALIZED HOOK BELOW  U+01AB LATIN SMALL LETTER T WITH PALATAL HOOK IPA: palatalized superseded by  U+02B2 MODIFIER LETTER SMALL J
 U+0322 COMBINING RETROFLEX HOOK BELOW  U+0290 LATIN SMALL LETTER Z WITH RETROFLEX HOOK IPA: retroflex superseded by  U+02DE MODIFIER LETTER RHOTIC HOOK
 U+0334 COMBINING TILDE OVERLAY  U+026B LATIN SMALL LETTER L WITH MIDDLE TILDE IPA: velarized or pharyngealized
 U+0335 COMBINING SHORT STROKE OVERLAY  U+0289 LATIN SMALL LETTER U BAR Amer: fricative or central vocoid, Some orthographies
 U+0336 COMBINING LONG STROKE OVERLAY  U+01E5 LATIN SMALL LETTER G WITH STROKE Amer: fricative or central vocoid, Some orthographies
 U+0337 COMBINING SHORT SOLIDUS OVERLAY  U+023C LATIN SMALL LETTER C WITH STROKE Obs: fricative, Some orthographies
 U+0338 COMBINING LONG SOLIDUS OVERLAY  U+023E LATIN CAPITAL LETTER T WITH DIAGONAL STROKE Obs: fricative, Some orthographies

Combining overlay and combining hook characters

Combined letter forms

Letters that are combinations of two letters may be named

  • by simply combining the two letter names
  • by combining the two letter names and adding DIGRAPH to the name, or
  • as LIGATURES instead of LETTERS.

Examples:

CombinationNameComment
A E  LATIN CAPITAL LETTER AE  
d b  LATIN SMALL LETTER DB DIGRAPH  
O E  LATIN CAPITAL LIGATURE OE  LATIN SMALL LIGATURE OE These are the only two Unicode characters with LIGATURE in their name that should be used as letters

Unicode two letter combinations

Most Unicode digraphs are two letters that touch but are otherwise unmodified. However, in some fonts the letters of some digraphs may not touch.

Most Unicode ligatures (with exception of the OE and oe Ligatures used as phonetic letters) are only for backward compatibility and should not be used.3

Abbreviations

  • IPA International Phonetic Alphabet. International standard for writing languages phonetically.
  • Amer Americanist phonetic writing system. Popularized by American linguists in the 50s and 60s. Designed to use characters that could be written using a typewriter.
  • UPA Uralic Phonetic Alphabet. Finno-Ugric phonetic transcription system that has been used primarily in Finland since 1903.


Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.

"[libcat]", Sat, Sep 24, 2005 19:32 (EDT)

Under 'Exceptions':

where you have "LATIN CAPITAL LETTER EZH REVERSED should be REVERSED ESH", the 'should be' should read "REVERSED EZH", not "ESH".

kent_spielmann, Mon, Sep 26, 2005 14:23 (EDT) [modified by martinpk on Mon, Sep 26, 2005 14:25 (EDT)]
Re: REVERSED ESH/EZH

Thank you for your careful attention to detail. The correction has been made.



1 The dotted circle used to represent a base character is Unicode number 25CC.
2 Unfortunately this is not true in Word (2003 and earlier versions) when selecting words. This is a bug which hopefully will be fixed in a future version of Word.
3 Ligatures are nominally typesetting characters that make certain letter combinations look better, particularly those with f, i and j. Most modern software will display or print them automatically so they should not be used in the actual data.

© 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Read our Privacy Policy. Contact us here.