Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: Encoding > Unicode > Training
Short URL: http://scripts.sil.org/UTTEditMap

Edit Your Mapping

Joan Wardell, 2005-10-17

Goals for this Step

In this step you will learn how to edit a mapping, write the needed commands, and to complete your mapping. You should expect to spend 2-3 weeks either on this step or on the Creating a Chart of Your Legacy Mapping step described previously. If you have an Encore font, this time is significantly reduced.

This step is part of the procedure How to Write a Conversion Mapping for your Legacy Font.

Note:

The TECkit Mapping Editor (installed previously in this step) will allow you to modify, compile, and save your .map file. You may use a different editor if you prefer, and only use the TECkit Mapping Editor for compiling.

Completing the Legacy Mapping Workbook

Here is an example of a Legacy Mapping workbook with a legacy font displayed in Column E, showing the letters "a,b,c".

Notice that Columns D and E (the legacy font) have the same character. This means that the Proposed Unicode mapping is the same as a standard font. This number is found in Column C. For 'a', it is U+0061.

Now look at this example of characters "B,C,D".

Note that the characters in Column E - Legacy Glyph are not the same as Column D. This means this legacy font has replaced B,C,D with other characters. These new characters will need to be identified.

Here the Legacy Glyph found in Column E has been identified and typed in the workbook in Column F. The matching Unicode character U+0256 now shows in Column G.

You will need to identify each unmatching Legacy Glyph in your Legacy Mapping Workbook.

Please continue below.

Editing the Map

Drag your new draft mapping (from previous Create Your Draft TECkit Mapping File step) to the TECkit Mapping Editor shortcut on your desktop. Adjust the window size as needed. Skip the header for now. It will be addressed in the next section.

Here are a few example lines from the draft mapping file found at Create Your Draft TECkit Mapping File.

0x62    <>    U+0062    ; latin_small_letter_b  --  b
0x63    <>    U+0063    ; latin_small_letter_c  --  c

Compare these lines with Columns B and F of your Legacy Mapping Workbook. The left column of the mapping file (called left-hand side) is identical to Column B of the workbook. You do not need to make any changes here. Then there are the characters "<>". Next is the list of Unicode codepoints (U+0061, etc.). The proposed Unicode numbers from Column F of your workbook should go here in your mapping. In the example above, the letters a,b,c are already correct. So the Unicode number from Column C (U+0061) is typed into the mapping on the right-hand side (prior to the comment which starts with semicolon). The same is done for the b and c lines.

Now look at the workbook example for the letter B (last example above). The glyph at code 0x42 was changed to a different character in the legacy font. It is U+0256, a latin small d with tail instead of B. The corresponding line in the mapping file for this character should then become:

0x43    <>    U+0043    ; latin_capital_letter_c  --  C 
0x44    <>    U+0044    ; latin_capital_letter_d  --  D 

You may wish to also change the comment as shown (everything after the semicolon).

Once you have completely filled out Column F of your Legacy Mapping Workbook, you only need to edit your mapping and copy those numbers into the appropriate lines of your draft mapping.

Make decisions for each character.

How to Identify your Legacy Characters

Here are some resources for finding your character in the Unicode standard: Where's Your Character?.

Note for Encore2Unicode mapping

If you created a draft mapping using Encore2Unicode, even though your mapping is filled in, you must check each line to make certain it is the correct character. Not all of the suggested characters are correct. It is only a draft. You can use the Legacy Mapping Workbook to verify exactly what each character looks like. You may wish to copy the suggested characters into your workbook column F, in order to decide, and to keep track. You may also wish to review the instructions in the TECkit tutorial TECkit mapping language conversion to learn more about how to edit your mapping. Encore2Unicode will make a "best guess" for each of your characters. You must verify each and correct as needed.

What is <> ?

TECkit mappings can be in run in two directions — you can convert a file from legacy to Unicode or from Unicode back to legacy. You specify the direction you are converting at the time you run the conversion.

You can write a single TECkit mapping that can convert "round-trip", by using "<>".

A TECkit map has 3 kinds of commands or "rules":

Examples:

  1. 0x67 > U+0067 ;one-way conversion of a legacy codepoint to Unicode.
  2. 0x67 < U+0067 ;one-way conversion of a Unicode codepoint to legacy.
  3. 0x67 <> U+0067 ;roundtrip conversion. Basically you can combine the two options above on the same line.

NOTE:

You may use the official Unicode name rather than the Unicode hexadecimal number on the right-hand side of your mapping. Replace spaces in the name with an underscore. For more information about official Unicode names, see Where's Your Character?

Alternative way to write a mapping:

0x67 <> latin_small_letter_g

At the end of this section, you should have all of column F filled out in your Legacy Mapping Workbook. In addition, columns B and F should match the left- and right-hand sides in your TECkit mapping file.

More Complex Commands

TECkit is capable of handling very complex mappings, including those where it must identify the context of a character before choosing a mapping. For more information, see the TECkit documentation or other people's mappings for examples.

Edit your Mapping Header

The mapping file includes a header. The header contains standardized items which are defined by other organizations with whom we cooperate. Please follow these instructions for filling out the information as accurately as possible.

EncodingName            "(REG_ID)-(FONT_NAME)-(VERSION)"
DescriptiveName         ""
Version                 "0"
Contact                   "mailto:(YOUR_ADDRESS_HERE)"
RegistrationAuthority   "(REG_NAME)"
RegistrationName        "(FONT NAME)-(VERSION)"

Make the following changes to the header:

  1. Replace "(REG_ID)" with "SIL" or other organization identifier.
  2. If not completed already, replace the first "(FONT_NAME)" with your legacy font name. Use the font name as it appears in a Windows font list, but replacing any dash with underscore. Do not use spaces. Note this is a different format than (FONT NAME) below.
  3. Replace "(VERSION)" in both places with year the encoding was introduced, not the year of the mapping file. This is usually the same year as the font was created.
  4. Add a Descriptive Name for this mapping. It is very helpful to also include the font filename (fontname.ttf) here or in a comment.
  5. Add 1 to "Version "0" with each public release of this mapping.
  6. Replace "(YOUR_ADDRESS_HERE)" with your email address.
  7. Replace "(REG_NAME)" with "SIL International" or other organization who is responsible for the encoding.
  8. Replace (FONT NAME) with font name or other encoding identifier if appropriate. Use the font name as it appears in a Windows font list. You may use spaces.
  9. Delete all parentheses in heading. Leave quotes in place on each line.
  10. If your mapping is missing this line, please add it below the TargetFlags line: RHSFlags (ExpectsNFC) ;NFC means that when going from Unicode back to legacy, the incoming data will be NFC-normalized before the mapping rules are applied. You can't normalize the LHS legacy data. ExpectsNFC is the normal setting.

Page History

2008-03-17 JW: added RHSFlags instruction
2008-02-26 JW: Reviewed, updated
2005-10-17 JW: Page created


© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.