Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: Encoding > Unicode > PUA
Short URL: http://scripts.sil.org/PUA_FAQ

Private Use Area - Frequently Asked Questions

Kent Spielmann, Lorna A. Priest, Bob Hallissy, 2007-06-14

This document provides answers to frequently asked questions about SIL’s Private Use Area (PUA). To view the complete text of each question and answer, click the question.

Questions:



Note

For a greater understanding of Unicode and the PUA, the  Unicode site may be of further interest.

Question: What is the PUA?

Answer: The Private Use Area (PUA) is a range of Unicode codepoints (E000 - F8FF and planes 15 and 16) that are reserved for private definition and use within an organisation or corporation for creating proprietary, non-standard character definitions. This might include use by software developers and end users who need a special set of characters for their own purposes. There are 6400 PUA code points (128K) available in the Basic Multilingual Plane. Although this may seem like a lot, this “code space” needs to be managed so that the available code points do not run out in the long term.

The PUA range in the Basic Multilingual Plane is from U+E000 to U+F8FF. In addition, Unicode reserves the so-called “supplementary planes” 15 (U+F0000–U+FFFFF) and 16 (U+100000–U+10FFFF) for private use.

Question: What are Unicode codepoints?

Answer: Codepoints are similar to ASCII numbers used to represent symbols and characters found in the writing systems of languages throughout the world. Instead of having just 256 different symbols available, the Basic Multilingual Plane uses 65,536 code values (64K). By convention, code points are usually represented by 4 hexadecimal digits to represent the 64K different characters. For example, U+E000 represents the code value at E000 (decimal 57,344) which is the lowest number in the PUA range.

Moreover, Unicode also assumes smart font technology in the software which uses it. This means that Unicode only needs to assign one codepoint for letters and symbols which may vary depending on context. For example, smart fonts will place an accent on a letter in the proper place, be it on a lower or uppercase, wide or thin letter. It will even use a dotless i (or other letter variant) when appropriate.

Codepoints in the 16 supplementary planes may be represented using 5 or 6 hexadecimal digits. For example, the lowest codepoint in supplementary plane 15 is U+F0000, and the lowest codepoint in plane 16 is U+100000.

The supplementary planes are not used in the current version of Unicode but have been provided for for future extensions. The SIL PUA strategy does include using the PUA supplementary planes for cross mapping between various entity allocations in the Basic Multilingual Plane PUA area.

Question: What is SIL’s PUA assignment strategy?

Answer: SIL corporation-wide PUA usage uses the codepoint range U+F100–U+F8FF. Individual SIL field entities might also make independent PUA character assignments in the range U+E000–U+EFFF.

SIL’s corporate strategy is based on PUA Corporate Strategy.

Question: What are SIL’s PUA assignments?

Answer: The SIL PUA strategy allows local SIL entities to make free use of the lower portion of the PUA range, U+E000–U+EFFF, while the upper portion, U+F100–U+F8FF, is reserved and managed for corporation-wide use.

The SIL PUA committee publishes documentation on approved assignments of characters to Corporate-controlled portions of the PUA.

The following table shows how Unicode private-use character assignments are arranged within the SIL corporate-wide portion of the PUA.

F000..F0FF (reserved)
F100..F13F specials
F140..F15F (reserved)
F160..F17F combining marks
F180..F1EF modifier letters (e.g. superscripts)
F1F0..F1FF (reserved)
F200..F2FF Latin
F300..F31F Hebrew
F320..F33F Cyrillic
F340..F34F (reserved)
F350..F6FF other non-Latin alphabetic characters
F700..F8FF (reserved)

Roadmap (Revised 2003-8-11)

Question: SIL PUA version — what is that?

Answer:

The problem

Over time the list of characters assigned to the SIL PUA changes. These changes take place primarily for the following three possible reasons:

  1. New characters added to the PUA
  2. Deprecation of existing PUA characters because they’ve been accepted into the Unicode standard
  3. Bugs and mistakes in our documentation or character descriptions.

Changes to the PUA assignments will eventually result in changes to products that utilize these characters, including keyboards, fonts and encoding mapping tables1. Additions or minor bug fixes (reasons 1 and 3) are not likely to impact many users (unless, of course, the addition or fix was something you specifically needed).

Deprecation of PUA characters (2), however, can make a big difference for many users, and you need to exercise caution when installing updated keyboards or mapping tables. For example, when a character is accepted into Unicode, we will release new keyboards that now reference the official character code rather than the PUA code. If you did part of your project with the old keyboard and part with the new, then you could have inconsistent data: some of it using the PUA code and some using the official character code.

A critical problem in all this is how do you know what keyboards, for example, are compatible with each other, i.e., generate the same code (PUA or official) for a given character?

The solution

We have decided to implement a versioning system that will allow you to easily determine the compatibility (at least as far as PUA codepoints is concerned) between various components. There will be two parts to the PUA version identifier:

  • The primary moniker will be the first two components (i.e., major.minor) of a Unicode version number ( http://www.unicode.org/standard/versions/). The release of a new (major or minor) version of the standard represents an opportunity for some of our PUA characters to be officially accepted into Unicode thus increasing the list of deprecated PUA characters.
  • The secondary moniker will be an alpha letter (a, b, c…) representing revisions since the Unicode release. Such revisions will typically be additions or bugfixes (1 and 3 above).

Examples

Looking at our history of PUA documentation available on SIL Corporate PUA Assignments, we see

  • SIL PUA 5.0c 2007-01-19
  • SIL PUA 5.0b 2006-08-14
  • SIL PUA 5.0a 2006-06-12
  • SIL PUA 5.0 2006-01-09
  • SIL PUA 4.1e 2005-09-08

From this we can determine:

  • Between PUA versions 4.1e and 5.0, some PUA characters gained official Unicode assignments and so the PUA codepoints became deprecated.
  • Between PUA versions 5.0 and 5.0c, no PUA characters were deprecated, though some might have been added.

Therefore we conclude that we should think twice before using, for example, both 4.1e and 5.0 keyboards on the same project. But we could probably use any keyboard from 5.0 to 5.0c interchangeably.

For further reading

The PUA versioning scheme is one part of the SIL PUA Deprecation Strategy. The complete strategy document is available: A strategy for deprecating SIL PUA characters.

Question: What is the PUA Committee?

Answer: The SIL PUA Committee is chartered by  SIL International as an advisory body to the Non-Roman Script Initiative (NRSI) to develop and promote policy for managed use of the Unicode Private Use Area (PUA) within SIL. Because of the far-reaching effects of its decisions, the NRSI requested that the SIL PUA Committee be set up so it could hold itself accountable to the rest of the corporation. The committee is made up of members that represent various academic domains within SIL. The committee is also the point of contact within SIL and with other organizations regarding the PUA.

Question: How do I use the PUA?

Answer: The SIL PUA Committee recommends the following guidelines to individuals and entities within SIL who desire to add custom characters to Unicode fonts.

Individual Users should check with your local SIL Entity to see if they have published or unpublished PUA guidelines for your area.

Using appropriate font software, add characters only to the lower PUA region U+E000–U+EFFF.

If you need characters that you expect will be included in published works, particularly when developing a new orthography, contact your local SIL entity or the to see if there are characters already developed that will suit your needs.

Question: How do I add characters to my entity block?

Answer: First, check thoroughly that your character is not already in  Unicode (See Unicode 8.0 Latin and Cyrillic characters – sorted, Unicode Encoding Resources or  http://www.unicode.org/charts) or within SIL’s Corporate PUA Assignments. Second, if you do not find a suitable character contact your entity Unicode representative.

Question: Where should I go to get help with organising my entity block?

Answer: A document detailing this information is presently being worked on.

Question: How do I submit characters for inclusion in the SIL corporate PUA (the corporation-wide block)?

Answer: Use the SIL Private Use Area Registration Form. Only proposals submitted by SIL Entities or FOBAI member organizations will be considered.

Question: How do I get characters added to Unicode?

Answer: The easiest way for most SIL and FOBAI members is to submit a proposal for their inclusion in the SIL corporate PUA (see SIL Private Use Area Registration Form. The submission form includes the option of requesting that a Unicode proposal be made for your character(s).

Question: What is the relationship between SIL’s PUA Committee and the UTC? Is there any conflict between this strategy and the Unicode guidelines?

Answer: The PUA Committee is operating within the guidelines as described in the “Private Use Area: U+E000–U+F8FF” section of  Special Areas and Format Characters (Unicode Consortium. 2003 The Unicode Standard Version 4.0. Boston, MA: Addison-Wesley) and is operating within the guidelines of the  Unicode Technical Committee (UTC) and aims to work in tandem with the UTC.

Page History

2010-01-05 LP: changed IFOBA to FOBAI
2007-06-14 BH/LP: added Q&A about versioning

2005-06-24 LP: added IFOBA, where relevant

2004-01-27 LP: page creation



Note: If you want to add a response to this article, you need to enable cookies in your browser, and then restart your browser.

Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.



1 SIL Unicode fonts are designed as much as possible to be backwards compatible with older versions of the PUA by rendering a deprecated PUA character or its Unicode replacement the same way.

© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.