NRSI: Computers & Writing Systems
What Number is Your Character?
The complete list of routines in How to Write a Conversion Mapping for your Legacy Font is here.
Goals of this Procedure
This procedure tells you how to find the number your computer is using to store a given character.
This step is part of the procedure How to Write a Conversion Mapping for your Legacy Font.
How Letters are Stored
All the letters stored in a computer are actually numeric – they aren't really stored as tiny abc's. Instead, each letter in the alphabet is assigned a number and that number is stored instead.
However, computers were originally designed for only 256 letters. That's not enough.
Unicode is a new system that assigns a unique number to every letter in every language – lots more than 256. To be compatible, all data that was formerly stored in the first 256 numbers now needs to be changed to new, hopefully permanent, numbers, unless the numbers are already the correct ones, of course.
To continue this procedure, you need to have a recent version of Microsoft Word and install a special program called a "macro". To install, first go to Simple Install for Unicode Macros.
What's the Number of Your Character?
Microsoft Windows and other operating systems have been storing data using Unicode numbers for some time now. To see the Unicode value being used by Microsoft Word for your character:
Open your legacy document in your work folder using Microsoft Word.
The Alt-x procedure is not always reversible, particularly if legacy fonts are involved. Always use a copy of your document and do not save unless you are certain the results are what you desire.
Select a single character so it is highlighted.
Hold down the Alt key and type x . This may show you the number of your character, unless the font is non-standard. Then it may not work at all. To undo, press Alt - x again. If that doesn't work, click > or Ctrl - z .
Here is a way that will show you the correct code, regardless of the font.
Re-select a single character so it is highlighted. Look for on your toolbar. Hover the mouse over it and it should say "Show Unicode".
Click the Show Unicode icon on the toolbar.
It will display for you a number. This is the number the computer is using to store the character. It is displayed in hexadecimal1, because the decimal numbers would get too long and cumbersome.
Other Show Unicode Options
You can select more than one character and Show Unicode will give you the list.
You can place the cursor inbetween 2 characters, and Show Unicode will give you all the characters before and after the current position, on the same line.
What's the Right Number?
Now that I know the Unicode number of my character, what do I do with it?
If your number is 4 digits and starts with F0, your data was likely entered with a Symbol font. This data needs to be converted to Unicode. This is how it is currently being handled: Microsoft Word converts all incoming symbol font data into a special Private Use Area2. Thus, these numbers all start with F. This is a problem you need to address. We will take a closer look at the PUA area later.
If your number is 4 digits starting with 00, it is considered to be English (or really Basic Latin). It's probably correct if this is true. However, if your data is not English or related to English, then this data needs to be converted to Unicode.
If your number is 4 digits starting at or higher than "0100", this is already Unicode data. You don't need to convert it. To check how the official Unicode Standard defines a certain character, you can search for it by number here: Unicode Charts.
If you entered regular English data with a normal font, the numbers the computer uses to store your data should already be correct. But if you work in another language or use unusual characters, such as IPA (International Phonetic Alphabet) or a minority or non-Roman language, you may want to consider converting important data to Unicode.
2008-02-21 JW: reviewed
2005-07-26 JW: Page created