Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SUPPORT | DONATE

You are here: Type Design > Principles
Short URL: http://scripts.sil.org/BasicCharSet

Basic Set of characters needed in a Non-Roman font

NRSI team, 2010-12-09

Some people have asked what a basic character set for a Non-Roman font should include (besides the Non-Roman characters). The chart below is our recommendation for a basic set of characters. It includes the union of Windows CP1252 and Mac-Roman.

Although this page is not intended to explain how to implement OpenType for these characters, there are some notes which might be valuable for implementers.

Block Chars Why Include?
  Basic Latin 0020..007F Codepage 1252 and Mac-Roman
  Latin-1 Supplement 00A0..00FF

Codepage 1252

(many are also in Mac-Roman)
  Latin Extended-A 0131 Mac-Roman
  Latin Extended-A 0152..0153, 0160..0161, 0178, 017D..017E

Codepage 1252

(many are also in Mac-Roman)
  Latin Extended-B 0192 Codepage 1252 and Mac-Roman
  IPA Extensions Some have requested adding any IPA characters that are in use in the country. NRSI recommends not trying to make your Non-Roman font suitable for linguistics as well. Encourage linguists to use a complete font such as Doulos SIL or Charis SIL. An application like  FLEx should recognize that such linguistic "markup" may be in a different script from the vernacular data, so it may need to use separate fonts (and writing system behaviors) for such elements that are not in the same orthography as the vernacular data.
  Spacing Modifier Letters 02C6, 02DC Codepage 1252 and Mac-Roman
  Spacing Modifier Letters 02C7, 02D0, 02D8..02DB Mac-Roman
  Combining Diacritical Marks 034F add if your Non-Roman script needs the CGJ
  Greek and Coptic 03C0 Mac-Roman
  General Punctuation 2000..2012

Note 1: Control characters, typically shown in the standard with a dotted square box, should be included to support publishing and Non-Roman fonts. Depending on rendering engine and smart font logic, the default glyph for a control character might be (1) a visible glyph which, if a "show invisibles" feature is not enabled, is then either deleted or substituted by an invisible glyph during rendering, or (2) an invisible glyph which can then be substituted by a visible glyph using the "show invisibles" feature.

Note 2: Many of the spaces and dashes are necessary for publishing.

Note 3: Some punctuation characters in a font need to work for both Roman and non-Roman script, but a different shape may be needed for each. In that case, you should include both sets of punctuation in the font. The cmap should point to the Latin-compatible ones; the others should be unencoded. However, in the OpenType table for your Non-Roman script, in the "ccmp" feature, substitute the Non-Roman-compatible glyphs for the Latin ones. (Thus the font works for both scripts). This will only work if the Non-Roman script is well supported by applications and rendering engines. In the case of a script such as Ethiopic, the Ethiopic-style punctuation never gets substituted. In this case, the default should be Ethiopic-style punctuation and a Stylistic Set is an option for turning on Latin-style punctuation.
  General Punctuation 2013..2014 Codepage 1252 and Mac-Roman
  General Punctuation 2015 See "Note 2" above.
  General Punctuation 2018..201A, 201C..201E, 2020..2022, 2026 Codepage 1252 and Mac-Roman
  General Punctuation 2027 Used in publishing.
  General Punctuation 2028..202F See "Note 1" above.
  General Punctuation 2030, 2039..203A Codepage 1252 and Mac-Roman
  General Punctuation 2044 Mac-Roman
  General Punctuation 2060 See "Note 1" above.
  Currency Symbols 20AC

Codepage 1252

Consider adding currency symbol(s) that may be needed for the countries where the fonts might be used.
  Letterlike Symbols 2122 Codepage 1252 and Mac-Roman
  Letterlike Symbols 2126 Mac-Roman
  Mathematical Operators 2202, 2206, 220F, 2211 Mac-Roman
  Mathematical Operators 2219 Sometimes used instead of 00B7.
  Mathematical Operators 221A, 221E, 222B, 2248, 2260, 2264..2265 Mac-Roman
  Geometric Shapes 25CA Mac-Roman
  Geometric Shapes 25CC

If your OpenType font supports combining diacritics, be sure to include U+25CC DOTTED CIRCLE in your font, and optionally include this in your positioning rules for all your combining marks. This is because Uniscribe will insert U+25CC between "illegal" diacritic sequences (such as two U+064E characters in a row) to make the mistake more visible.

See  http://www.microsoft.com/typography/otfntdev/arabicot/other.htm.
  Alphabetic Presentation Forms FB01..FB02 Mac-Roman
  Variation Selectors FE00.FE0F

We recommend that all fonts include support for Unicode variation selectors, even if the characters supported by a font don't combine with VSs — in fact, especially if they don't. I.e., add them to the cmap and point them to null glyphs.

The reason is this: it's possible that at some point in the future a VS mapping could be defined for potentially any character in Unicode. It's not all that likely to happen for the characters in the standard now, but there is no way in principle to guarantee it. If at some point in the future text started appearing with VSs where you didn't expect them before (e.g. a VS within Cyrillic script), then you wouldn't want people (using your previously existing fonts) to suddenly start seeing boxes (or whatever is used to represent unsupported glyphs). Of course, your font would not display the variant glyph they would like to see, but the font would still display something legible.

Related to this, people need a way on occasion to see hidden control characters such as VSs, ZWJs, viramas and other similar characters. All fonts should include control picture glyphs for all of these, and ... that there be an OT feature that turns off any shaping based on these controls and causes these control pictures to be displayed.1
  Arabic Presentation Forms-B FEFF BYTE ORDER MARK, making this visible might be helpful
  Specials FFFC..FFFD Encoding conversion utilities often put these in, and it's a lot easier for someone looking at the converted text to figure out what's going on if these have a visual representation.

1 This note was first posted to an OpenType mailing list by a member of the NRSI team

© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.