|

Home
Contact Us
General
Initiative B@bel
WSI Guidelines
Encoding
Principles
Unicode
Training
Tutorials
PUA
Conversion
Resources
Utilities
TECkit
Maps
Resources
Input
Principles
Utilities
Tutorials
Resources
Type Design
Principles
Design Tools
Formats
Resources
Font Downloads
Gentium
Doulos
IPA
Rendering
Principles
Technologies
OpenType
Graphite
Resources
Font FAQ
Links
Glossary
|
NRSI:
Computers & Writing Systems
You are here: Type
Design > Principles
Short URL: http://scripts.sil.org/BasicCharSet
Basic Set of characters needed in a Non-Roman font
NRSI team, 2010-12-09; 4962 reads
Some people have asked what a basic character set for a Non-Roman font should include (besides the
Non-Roman characters). The chart below is our recommendation for a basic set of characters. It includes the
union of Windows CP1252 and Mac-Roman.
Although this page is not intended to explain how to implement OpenType for these characters, there are
some notes which might be valuable for implementers.
| |
Basic Latin |
0020..007F |
Codepage 1252 and Mac-Roman |
| |
Latin-1 Supplement |
00A0..00FF |
Codepage 1252 (many are also in Mac-Roman)
|
| |
Latin Extended-A |
0131 |
Mac-Roman |
| |
Latin Extended-A |
0152..0153, 0160..0161, 0178, 017D..017E |
Codepage 1252 (many are also in Mac-Roman)
|
| |
Latin Extended-B |
0192 |
Codepage 1252 and Mac-Roman |
| |
IPA Extensions |
|
Some have requested adding any IPA characters that are in use
in the country. NRSI recommends not trying to make your Non-Roman font
suitable for linguistics as well. Encourage linguists to use a complete font such as Doulos SIL or Charis SIL.
An application like FLEx should recognize that such linguistic "markup"
may be in a different script from the vernacular data, so it may need to use separate fonts (and
writing system behaviors) for such elements that are not in the same orthography as the vernacular
data. |
| |
Spacing Modifier Letters |
02C6, 02DC |
Codepage 1252 and Mac-Roman |
| |
Spacing Modifier Letters |
02C7, 02D0, 02D8..02DB |
Mac-Roman |
| |
Combining Diacritical Marks |
034F |
add if your Non-Roman script needs the CGJ |
| |
Greek and Coptic |
03C0 |
Mac-Roman |
| |
General Punctuation |
2000..2012 |
Note 1: Control characters, typically shown in the standard with a
dotted square box, should be included to support publishing and Non-Roman fonts. Depending on
rendering engine and smart font logic, the default glyph for a control character might be (1) a
visible glyph which, if a "show invisibles" feature is not enabled, is then either deleted or
substituted by an invisible glyph during rendering, or (2) an invisible glyph which can then be
substituted by a visible glyph using the "show invisibles" feature.
Note 2: Many of the spaces and dashes are necessary for
publishing. Note 3: Some punctuation characters in a font need to
work for both Roman and non-Roman script, but a different shape may be needed for each. In that
case, you should include both sets of punctuation in the font. The cmap should point to the
Latin-compatible ones; the others should be unencoded. However, in the OpenType table for your
Non-Roman script, in the "ccmp" feature, substitute the Non-Roman-compatible glyphs for the Latin
ones. (Thus the font works for both scripts). This will only work if the
Non-Roman script is well supported by applications and rendering engines. In the case of a script
such as Ethiopic, the Ethiopic-style punctuation never gets substituted. In this case, the default
should be Ethiopic-style punctuation and a Stylistic Set is an option for turning on Latin-style
punctuation.
|
| |
General Punctuation |
2013..2014 |
Codepage 1252 and Mac-Roman |
| |
General Punctuation |
2015 |
See "Note 2" above. |
| |
General Punctuation |
2018..201A, 201C..201E, 2020..2022, 2026 |
Codepage 1252 and Mac-Roman |
| |
General Punctuation |
2027 |
Used in publishing. |
| |
General Punctuation |
2028..202F |
See "Note 1" above. |
| |
General Punctuation |
2030, 2039..203A |
Codepage 1252 and Mac-Roman |
| |
General Punctuation |
2044 |
Mac-Roman |
| |
General Punctuation |
2060 |
See "Note 1" above. |
| |
Currency Symbols |
20AC |
Codepage 1252 Consider adding currency symbol(s) that may be needed for the countries where
the fonts might be used.
|
| |
Letterlike Symbols |
2122 |
Codepage 1252 and Mac-Roman |
| |
Letterlike Symbols |
2126 |
Mac-Roman |
| |
Mathematical Operators |
2202, 2206, 220F, 2211 |
Mac-Roman |
| |
Mathematical Operators |
2219 |
Sometimes used instead of 00B7. |
| |
Mathematical Operators |
221A, 221E, 222B, 2248, 2260, 2264..2265 |
Mac-Roman |
| |
Geometric Shapes |
25CA |
Mac-Roman |
| |
Geometric Shapes |
25CC |
If your OpenType font supports combining diacritics, be sure to include U+25CC DOTTED CIRCLE in
your font, and optionally include this in your positioning rules for all your combining marks. This
is because Uniscribe will insert U+25CC between "illegal" diacritic sequences (such as two U+064E
characters in a row) to make the mistake more visible. See http://www.microsoft.com/typography/otfntdev/arabicot/other.htm.
|
| |
Alphabetic Presentation Forms |
FB01..FB02 |
Mac-Roman |
| |
Variation Selectors |
FE00.FE0F |
We recommend that all fonts include support for Unicode variation
selectors, even if the characters supported by a font don't combine with VSs — in fact, especially
if they don't. I.e., add them to the cmap and point them to null glyphs.
The reason is this: it's possible that at some point in the future a VS mapping could be defined
for potentially any character in Unicode. It's not all that likely to happen for the characters in
the standard now, but there is no way in principle to guarantee it. If at some point in the future
text started appearing with VSs where you didn't expect them before (e.g. a VS within Cyrillic
script), then you wouldn't want people (using your previously existing fonts) to suddenly start
seeing boxes (or whatever is used to represent unsupported glyphs). Of course, your font would not
display the variant glyph they would like to see, but the font would still display something
legible. Related to this, people need a way on occasion to see hidden control characters such as
VSs, ZWJs, viramas and other similar characters. All fonts should include control picture glyphs
for all of these, and ... that there be an OT feature that turns off any shaping based on these
controls and causes these control pictures to be displayed.
|
| |
Arabic Presentation Forms-B |
FEFF |
BYTE ORDER MARK, making this visible might be helpful |
| |
Specials |
FFFC..FFFD |
Encoding conversion utilities often put these in, and it's a
lot easier for someone looking at the converted text to figure out what's going on if these have a
visual representation. |
© 2003-2013 SIL International, all rights
reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us at .
|