Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: Encoding > Unicode > Training
Short URL: http://scripts.sil.org/UTTUsingUnicodeMacros

What Number is Your Character?

How to Find the Numeric Value of a Character

Joan Wardell, 2005-07-26

Goals of this Procedure

This procedure tells you how to find the number your computer is using to store a given character.

This step is part of the procedure How to Write a Conversion Mapping for your Legacy Font.

How Letters are Stored

All the letters stored in a computer are actually numeric – they aren't really stored as tiny abc's. Instead, each letter in the alphabet is assigned a number and that number is stored instead.

However, computers were originally designed for only 256 letters. That's not enough.

Unicode is a new system that assigns a unique number to every letter in every language – lots more than 256. To be compatible, all data that was formerly stored in the first 256 numbers now needs to be changed to new, hopefully permanent, numbers, unless the numbers are already the correct ones, of course.

To continue this procedure, you need to have a recent version of Microsoft Word and install a special program called a "macro". To install, first go to Simple Install for Unicode Macros.

What's the Number of Your Character?

Microsoft Windows and other operating systems have been storing data using Unicode numbers for some time now. To see the Unicode value being used by Microsoft Word for your character:

Open your legacy document in your work folder using Microsoft Word.

Warning

The Alt-x procedure is not always reversible, particularly if legacy fonts are involved. Always use a copy of your document and do not save unless you are certain the results are what you desire.

Select a single character so it is highlighted.

Hold down the  Alt  key and type  x . This may show you the number of your character, unless the font is non-standard. Then it may not work at all. To undo, press  Alt - x  again. If that doesn't work, click Edit > Undo Toggle Character Code or  Ctrl - z .

Here is a way that will show you the correct code, regardless of the font.

Re-select a single character so it is highlighted. Look for on your toolbar. Hover the mouse over it and it should say "Show Unicode".

Click the Show Unicode icon on the toolbar.

It will display for you a number. This is the number the computer is using to store the character. It is displayed in hexadecimal1, because the decimal numbers would get too long and cumbersome.

Click  Okay  to close.

Other Show Unicode Options

You can select more than one character and Show Unicode will give you the list.

You can place the cursor inbetween 2 characters, and Show Unicode will give you all the characters before and after the current position, on the same line.

What's the Right Number?

Now that I know the Unicode number of my character, what do I do with it?

If your number is 4 digits and starts with F0, your data was likely entered with a Symbol font. This data needs to be converted to Unicode. This is how it is currently being handled: Microsoft Word converts all incoming symbol font data into a special Private Use Area2. Thus, these numbers all start with F. This is a problem you need to address. We will take a closer look at the PUA area later.

If your number is 4 digits starting with 00, it is considered to be English (or really Basic Latin). It's probably correct if this is true. However, if your data is not English or related to English, then this data needs to be converted to Unicode.

If your number is 4 digits starting at or higher than "0100", this is already Unicode data. You don't need to convert it. To check how the official Unicode Standard defines a certain character, you can search for it by number here:  Unicode Charts.

Conclusion

If you entered regular English data with a normal font, the numbers the computer uses to store your data should already be correct. But if you work in another language or use unusual characters, such as IPA (International Phonetic Alphabet) or a minority or non-Roman language, you may want to consider converting important data to Unicode.

Page History

2008-02-21 JW: reviewed

2005-07-26 JW: Page created


1 hexadecimal: The hexadecimal number system uses the number 0-9, plus A-F to represent numbers in base-16. A-F represent 10, 11, 12, 13, 14, 15, and 16 respectively but using a single character instead of two.
2 The area Microsoft uses for PUA is a separate area from SIL's Private Use Area. The F0.. PUA numbers are like a temporary holding area for data that Microsoft cannot identify. You don't want to leave important data in the F0 area. However, you may be required to use SIL's Private Use Area for some characters.

© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.