Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: Encoding > Unicode > Training
Short URL: https://scripts.sil.org/UTTWriteMap

How to Write a Conversion Mapping for your Legacy Font

Converting your Legacy Data to Unicode

Joan Wardell, 2005-09-16

What is this Procedure?

This procedure explains how to convert a simple document typed in a legacy font to Unicode. A legacy font1, as referred to in these webpages, is one which has been created to work around the restrictions of the original computer design. It implies a non-standard encoding.

A legacy file is any file typed with a legacy font, and thus having a non-standard encoding.

If you do not understand these terms yet, more information will follow, particularly in steps 1-3.

Do I Need This Procedure?

If you are uncertain whether you need to convert your files to Unicode, follow the first 3 steps for a quick look at your data and your font. This procedure is only needed if you have used one or more legacy fonts, normally provided for a special need, such as for entering IPA or minority language data. If you only work in English or only use fonts that come standard with your operating system, you probably do not need this procedure.

If you decide you need to convert your important data files to Unicode, this webpage explains the procedure for writing a TECkit mapping file and converting a file to Unicode, using your mapping. If you are a Paratext user, consider using the Paratext Converter rather than these instructions.

Note:

You should not need to write a mapping for any major national language if your data can be displayed using the standard fonts which came with the operating system. Those conversions are built-in to Microsoft Word and other programs which will identify and convert data automatically. We will not address that procedure in these pages.

These instructions are for Windows XP.

Writing a TECkit mapping is a lengthy procedure, which will increase with difficulty at each step. You should expect to spend several weeks to follow these instructions for writing a mapping. In these instructions, you will create a TECkit mapping to convert a simple data file typed with a legacy font to Unicode. Your mapping may then be used to convert more complex documents such as Word or SFM documents. That procedure is not covered here.

Steps

For best results, follow these instructions in order. Each step presumes you know how to do the previous ones. You will need a legacy font and a file typed with that font (a legacy document).

In other words, you will need a non-English document and the font you used to type that document. You will find it helpful to have a work folder and have a shortcut on your desktop to your fonts folder (possibly C:WindowsFonts).

  1. What Number is Your Character?
  2. What's in Your Font?
  3. How to Identify Legacy and Other Fonts
  4. How to Setup Your Computer to do a Conversion Mapping
  5. What's in Your File? (includes How to Convert a Hex Number to a Decimal Number, Perl character count, and DOS character count)
  6. Creating a Chart of Your Legacy Mapping
  7. Create Your Draft TECkit Mapping File
  8. Edit Your Mapping (includes Where's Your Character? Unicode Resources)
  9. Compile and Run Your Mapping
  10. Test Your Mapping (includes What's in Your Unicode File?)

Page History

2008-02-27 JW: under review

2005-09-16 JW: Page created


1 The term "hacked" may also be used, but on this site, it does not carry a pejorative sense nor imply that a poor quality or illegal action has necessarily been done to the font.

© 2003-2018 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.