|
Computers & Writing Systems
You are here: Type Design > Resources SIL Encore Fonts version 3.0 Unicode CSTs
IntroductionBy putting Trans Unicode at the top of your CST, the TypeCaster font compiler (version 3.0 or later) will interpret access codes as Unicode values (rather than Windows or Ventura character codes). This makes it possible to build fonts that cover multiple codepages simultaneously. This is the same mechanism that makes it possible for the single Times New Roman font to supply Western, Greek, Turkish, Baltic, Central European, and Cyrillic scripts. This document is neither a Unicode tutorial, nor a codepage tutorial, but assumes you already know about these subjects. Rather, this document covers how to use TypeCaster to generate fonts that cover multiple codepages by expressing the access codes as Unicode. Using the TypeCaster Editor and Catalog with Unicode CSTsThe TypeCaster CST Editor cannot handle CST files that implement Unicode. You will have to edit such CSTs with Notepad or some other simple text editor. Even though the TypeCaster Editor cannot be used directly, there are several techniques to be aware of that will make creating a CST “by hand” a little easier. Copy/Paste from the CatalogYou can use the TypeCaster Catalog application to obtain the SILID (and comment!) for insertion into simple text editor. Simply select the glyph(s) you want in the catalog view and press Ctrl - C (or select on the menu). Then change to the text editor and press Ctrl - V (or select on menu) to paste the text in. The text pasted into your document will include the SILID and comment field. For example, if you had both SILID 2403 and SILID 6103 selected in the catalog and used this technique, here is what would be pasted into your text editor: 2403 /* hooktop B */ 6103 /* grave accent over upper case */ Now add in the access code you want for each of these, converting to a composite if needed: X1234/ 2403 /* hooktop B */ / 6103 /* grave accent over upper case */ Use TypeCaster Editor, then open CST with NotepadAnother useful technique is to use the TypeCaster CST Editor to create a CST with the characters you need, including any composites, etc. Now save the CST and open it with a text editor like Notepad and use cut and paste to copy the entries to your Unicode CST. Finally you change the access codes to be the Unicode values that you want. Opening CSTs With NotepadIf you try to open a Unicode CST with the TypeCaster CST Editor, the Editor will display an error about not being able to process the CST file, and offer to open it with Notepad. While this is one way to get to a text editor, it isn’t the most direct. Here are some tricks that you may want to use. From the main window of TypeCaster Compiler you can select a CST, then right-click on it, and you will get a context menu that includes Open With Notepad. This allows you to bypass the TypeCaster Editor step. If you use Unicode CSTs a lot, you can configure TypeCaster to use a launch a program other than the TypeCaster Editor when you double-click a CST in the main TypeCaster window. From TypeCaster, select the Options/Environment menu, and then the CST Editor tab. Now select Other and enter (or Browse for) the pathname of the desired application (e.g., C:WindowsNotepad.exe). Finally, if you like to open CSTs from Windows Explorer, the following registry hack will add an “Open with Notepad” entry to the context menu for CST files, allowing you to bypass the TypeCaster Editor by right clicking on a CST. This assumes you know what you are doing with the registry — please be cautious when editing your registry! REGEDIT4 [HKEY_CLASSES_ROOTFedit.Documentshelledit] "EditFlags"=hex:01,00,00,00 @="Open with &Notepad" [HKEY_CLASSES_ROOTFedit.Documentshelleditcommand] @="C:WindowsNotepad.exe %1" A slightly safer way to similar functionality is to put a shortcut to Notepad in your WindowsSendTo folder. This allows you to right click on any file (including CST files), select SendTo/Notepad and have the file opened by Notepad. Access Codes and Default FillIn most cases you will probably want to specify your Unicode access codes in hexadecimal since this is the way the Unicode standard documents the character set. If, for example, you were putting the IPA extensions into your font, you might have lines in your CST such as: /* U+0250 - U+02AF IPA Extensions */ x0250 1101 /* turned a */ x0251 1002 /* cursive a */ x0252 1202 /* turned cursive a */ x0253 1403 /* hooktop b */ x0254 1104 /* open o */ You may still specify access codes as decimal integers or single character ANSI constants for those values that make sense (usually 32-255). However, the Compiler does something special with 128-159. Unicode reserves the block from character codes 128 (U+0080) to 159 (U+009F) as control codes, and so there isn’t a need to be able to define these in a font. Therefore, as a convenience to CST authors, access codes in this range (and only in this range) are assumed to be Windows character codes (codepage 1252) and are mapped to the equivalent Unicode character. One of the useful side effects of this is that if you have not included either an Encode Symbol or a Fill command in your Unicode CST, then any codepage 1252 character codes (except one) that you have not specified in your CST will automatically be included in your font with the correct glyph. The only exception is Windows Character 183 which is an anomaly in the Windows system. In Windows 3.1, character 183 maps to Unicode U+2219 (BULLET OPERATOR) in OpenType fonts. For compatibility this is retained in present Windows systems but if an application asks the OS to convert text from ANSI to Unicode, character 183 will be mapped to U+00B7 (MIDDLE DOT) instead. TypeCaster chooses this latter mapping. The bottom line is this: If you want to build a font that has all the standard Windows characters (i.e., codepage 1252) plus some extra Unicode characters (e.g., additional codepage coverage), start with the following CST and add the Unicode access codes that you need: Trans Unicode Encode Normal CodepageRange /* See below for help in completing the */ UnicodeRange /* CodepageRange and UnicodeRange entries */ /* This is included to complete the Windows character set: x2219 0002 /* 1252: B7 ;bullet operator */ Codepage and Unicode coverageNow that you can build a font containing several hundred glyphs and covering several codepages, you must tell Windows what codepages are actually covered by this font. You must also identify what Unicode blocks are covered. By the way, “coverage” is not a well-defined term. Microsoft does specify what characters, or even what percentage of characters, you have to provide within a given codepage or Unicode block to qualify as covering that range or block. Use your own judgment — if you think you have enough to be useful, then tell Windows that you do cover that range or block. Codepage and Unicode coverage is specified using two fields in the OpenType font. Each field is a bit vector wherein each bit represents a codepage or Unicode block; a value of 1 indicates the font covers that codepage or block, and a value of zero indicates the font does not. The codepage range vector is 32 bits long, and the Unicode range vector is 64 bits long. The definitions for these bits, that is, what bit indicates what codepage or Unicode block, are given in the Appendix. In a CST you specify the codepage and Unicode range vectors by two special commands at the top of the CST. Each command requires a comma-separated list of 32-bit numbers, each specified as 8 hexadecimal digits (leading zeros may be omitted). The CodePageRange command can accept up to two numbers (totaling 64 bits), and the UnicodeRange command can accept up to four numbers (totaling 128 bits). As an example, you might have the following commands at the top of your CST: trans unicode encode normal codepage 97 unicode 8000027F,10006079 (All CST commands may be abbreviated to as few as two letters, so co is the same as codepage is the same as CodePageRange.) The interpretation of the hexadecimal arguments is as follows. Each number represents 32 bits of the coverage vector. The first number represents bits 0-31, the next bits 32-63, etc. Within each number, the least significant bit is the lowest numbered, while the most significant bit is the highest numbered. Each hex digit, of course, represents 4 bits. Pictorially, the Unicode range command in the above example would be interpreted as follows: ![]() In this case, since only two of the 4 possible UnicodeRange parameters have been specified, the remaining two default to zero, so none of bits 64-127 are set. Referring to the codepage Range Bits table in the Appendix and the above diagram, we see that the line codepage 97 indicates that this font covers the codepages represented by bits 0, 1, 2, 4, and 7, which would be codepages 1252 (Latin 1), 1250 (Latin 2: Eastern Europe), 1251 (Cyrillic), 1254 (Turkish), and 1257 (Windows Baltic). For an Excel workbook that can calculate the hexidecimal values for Unicode and OS/2 range values, see OS/2 table Range bit calculation workbook. Testing your fontIf you have built and installed what you think is a Unicode font that covers multiple codepages, there are a number of ways you can test to see if you got the codepage and Unicode ranges right. First off, open WordPad and select your font from the pull-down menu. Then drop down the Font Script list and see what character sets are listed — it should match your CodepageRange data. Compare with the available Times New Roman fonts. A useful tool available free from Microsoft’s web site is called the Font Properties extension. It adds property pages (“tabs”) to the dialog you get when you right-click on a OpenType font file and select Properties from the menu. In particular, one of the added pages enumerates the codepages and Unicode blocks supported by the font. This is the most direct way to see if you got the UnicodeRange and CodePageRange right. AppendixUnicode Range bitsThe following table is excerpted from the TrueType font specification, v1.1
TrueType font specification, v1.1 Codepage range bitsThe following table is excerpted from the TrueType font specification, v1.1
TrueType font specification, v1.1 Internet Resources© 2003-2023 SIL International, all rights reserved, unless otherwise noted elsewhere on this page. |