Computers & Writing Systems
Private Use Area - Frequently Asked Questions
SIL PUA Pages
This document provides answers to frequently asked questions about SIL’s Private Use Area (PUA). To view the complete text of each question and answer, click the question.
For a greater understanding of Unicode and the PUA, the Unicode site may be of further interest.
Question: What is the PUA?
Answer: The Private Use Area (PUA) is a range of Unicode codepoints (E000 - F8FF and planes 15 and 16) that are reserved for private definition and use within an organisation or corporation for creating proprietary, non-standard character definitions. This might include use by software developers and end users who need a special set of characters for their own purposes. There are 6400 PUA code points (128K) available in the Basic Multilingual Plane. Although this may seem like a lot, this “code space” needs to be managed so that the available code points do not run out in the long term.
The PUA range in the Basic Multilingual Plane is from U+E000 to U+F8FF. In addition, Unicode reserves the so-called “supplementary planes” 15 (U+F0000–U+FFFFF) and 16 (U+100000–U+10FFFF) for private use.
Question: What are Unicode codepoints?
Answer: Codepoints are similar to ASCII numbers used to represent symbols and characters found in the writing systems of languages throughout the world. Instead of having just 256 different symbols available, the Basic Multilingual Plane uses 65,536 code values (64K). By convention, code points are usually represented by 4 hexadecimal digits to represent the 64K different characters. For example, U+E000 represents the code value at E000 (decimal 57,344) which is the lowest number in the PUA range.
Moreover, Unicode also assumes smart font technology in the software which uses it. This means that Unicode only needs to assign one codepoint for letters and symbols which may vary depending on context. For example, smart fonts will place an accent on a letter in the proper place, be it on a lower or uppercase, wide or thin letter. It will even use a dotless i (or other letter variant) when appropriate.
Codepoints in the 16 supplementary planes may be represented using 5 or 6 hexadecimal digits. For example, the lowest codepoint in supplementary plane 15 is U+F0000, and the lowest codepoint in plane 16 is U+100000.
The supplementary planes are not used in the current version of Unicode but have been provided for for future extensions. The SIL PUA strategy does include using the PUA supplementary planes for cross mapping between various entity allocations in the Basic Multilingual Plane PUA area.
Question: What is SIL’s PUA assignment strategy?
Answer: SIL corporation-wide PUA usage uses the codepoint range U+F100–U+F8FF. Individual SIL field entities might also make independent PUA character assignments in the range U+E000–U+EFFF.
SIL’s corporate strategy is based on PUA Corporate Strategy.
Question: What are SIL’s PUA assignments?
Answer: The SIL PUA strategy allows local SIL entities to make free use of the lower portion of the PUA range, U+E000–U+EFFF, while the upper portion, U+F100–U+F8FF, is reserved and managed for corporation-wide use.
The SIL PUA committee publishes documentation on approved assignments of characters to Corporate-controlled portions of the PUA.
The following table shows how Unicode private-use character assignments are arranged within the SIL corporate-wide portion of the PUA.
Roadmap (Revised 2003-8-11)
Question: SIL PUA version — what is that?
Over time the list of characters assigned to the SIL PUA changes. These changes take place primarily for the following three possible reasons:
Changes to the PUA assignments will eventually result in changes to products that utilize these characters, including keyboards, fonts and encoding mapping tables1. Additions or minor bug fixes (reasons 1 and 3) are not likely to impact many users (unless, of course, the addition or fix was something you specifically needed).
Deprecation of PUA characters (2), however, can make a big difference for many users, and you need to exercise caution when installing updated keyboards or mapping tables. For example, when a character is accepted into Unicode, we will release new keyboards that now reference the official character code rather than the PUA code. If you did part of your project with the old keyboard and part with the new, then you could have inconsistent data: some of it using the PUA code and some using the official character code.
A critical problem in all this is how do you know what keyboards, for example, are compatible with each other, i.e., generate the same code (PUA or official) for a given character?
We have decided to implement a versioning system that will allow you to easily determine the compatibility (at least as far as PUA codepoints is concerned) between various components. There will be two parts to the PUA version identifier:
Looking at our history of PUA documentation available on SIL Corporate PUA Assignments, we see
From this we can determine:
Therefore we conclude that we should think twice before using, for example, both 4.1e and 5.0 keyboards on the same project. But we could probably use any keyboard from 5.0 to 5.0c interchangeably.
For further reading
The PUA versioning scheme is one part of the SIL PUA Deprecation Strategy. The complete strategy document is available: A strategy for deprecating SIL PUA characters.
Question: What is the PUA Committee?
Answer: The SIL PUA Committee is chartered by SIL International as an advisory body to the Non-Roman Script Initiative (NRSI) to develop and promote policy for managed use of the Unicode Private Use Area (PUA) within SIL. Because of the far-reaching effects of its decisions, the NRSI requested that the SIL PUA Committee be set up so it could hold itself accountable to the rest of the corporation. The committee is made up of members that represent various academic domains within SIL. The committee is also the point of contact within SIL and with other organizations regarding the PUA.
Question: How do I use the PUA?
Answer: The SIL PUA Committee recommends the following guidelines to individuals and entities within SIL who desire to add custom characters to Unicode fonts.
Individual Users should check with your local SIL Entity to see if they have published or unpublished PUA guidelines for your area.
Using appropriate font software, add characters only to the lower PUA region U+E000–U+EFFF.
If you need characters that you expect will be included in published works, particularly when developing a new orthography, contact your local SIL entity or the to see if there are characters already developed that will suit your needs.
Question: How do I add characters to my entity block?
Answer: First, check thoroughly that your character is not already in Unicode (See Unicode 8.0 Latin and Cyrillic characters – sorted, Unicode Encoding Resources or http://www.unicode.org/charts) or within SIL’s Corporate PUA Assignments. Second, if you do not find a suitable character contact your entity Unicode representative.
Question: Where should I go to get help with organising my entity block?
Answer: A document detailing this information is presently being worked on.
Question: How do I submit characters for inclusion in the SIL corporate PUA (the corporation-wide block)?
Answer: Use the SIL Private Use Area Registration Form. Only proposals submitted by SIL Entities or FOBAI member organizations will be considered.
Question: How do I get characters added to Unicode?
Answer: The easiest way for most SIL and FOBAI members is to submit a proposal for their inclusion in the SIL corporate PUA (see SIL Private Use Area Registration Form. The submission form includes the option of requesting that a Unicode proposal be made for your character(s).
Question: What is the relationship between SIL’s PUA Committee and the UTC? Is there any conflict between this strategy and the Unicode guidelines?
Answer: The PUA Committee is operating within the guidelines as described in the “Private Use Area: U+E000–U+F8FF” section of Special Areas and Format Characters (Unicode Consortium. 2003 The Unicode Standard Version 4.0. Boston, MA: Addison-Wesley) and is operating within the guidelines of the Unicode Technical Committee (UTC) and aims to work in tandem with the UTC.
2010-01-05 LP: changed IFOBA to FOBAI
2005-06-24 LP: added IFOBA, where relevant
2004-01-27 LP: page creation
Note: If you want to add a response to this article, you need to enable cookies in your browser, and then restart your browser.
Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.