You are here: Encoding > Unicode > PUA
Short URL: http://scripts.sil.org/PUA_Procedure
Procedure for Registration of SIL Entity Private Use Area Assignments
(For Entity Portion of the PUA)
At the Computer Technical Conference in November 1998, the following motion was passed unanimously:
WHEREAS we require text encoding standards for the purposes of data interchange and archiving, and WHEREAS the Unicode Private Use Area is a limited Corporate resource,
MOVED to request the NRSI to develop and implement a plan for management of the Private Use Area and Unicode Surrogate allocations which balances both Corporate-wide and entity-specific needs.
The primary concerns regarding the limited PUA space are the following: Giving complete freedom to SIL entities in the utilization of the PUA space can result in haphazard and inefficient use of that resource, or worse, in incompatibilities which would hinder exchange of data within the Corporation. On the other hand, imposing a situation in which single body determines every aspect of PUA utilization would rob field entities of flexibility and would delay their work. A better situation would be a compromise between the need to coordinate use of the PUA and the desire to limit hindrances on field entities.
A basic strategy for this, outlined in Hosken (1998), has been endorsed by the Language Software Board (cf. Language Software Board 1999, Section C.2 and Appendix E). That plan allows for SIL entities to independently make PUA character assignments, according to their individual needs, in the range U+E000..U+EFFF, and requests that they register such character assignments with a central, coordinating body, the Non-Roman Script Initiative (NRSI). The plan also empowers the NRSI to make character assignments in the range U+F000..U+F8FF for characters that they deem to be of potential usefulness throughout the Corporation.
This document describes what the PUA character registration process consists of and provides a description of procedures for SIL entities (or individuals) to follow in submitting registrations.
PUA Character registration
PUA character registrations made by individual SIL entities will be incorporated into an SIL registry of PUA character assignments maintained for the Corporation by the NRSI. This endeavor, together with the conventions adopted for utilizing the PUA space mentioned above, represents an introduction of new bureaucracy into our Corporate operations which will place a greater demand on our limited resources. As a result, it is reasonable to question what the intended benefits are. There are several purposes for PUA character registration:
- It provides a means for documenting encodings used for SIL language data, which is useful for archival purposes.
- It provides a means by which data from any source within SIL might be unambiguously interpreted in terms of a reference set of character semantics.
- It results in greater consistency in encoding practices both within individual entities and throughout the Corporation.
- It provides a means for allocating characters that are of potential use throughout the Corporation (e.g. special phonetic characters) in a manner that will not result in conflicts with PUA characters used by individual entities.
- It provides SIL entities with an independent, expert review of their PUA character assignments which can catch potential encoding problems early on.
- Should an individual entity make a character assignment which may be of wider usefulness within the Corporation, it provides a mechanism whereby such characters can be brought to the attention of the body responsible for Corporate-wide character assignments, and duplication of character assignments can be avoided.
- It facilitates identification of characters that merit formal adoption within the Unicode standard, and provides some of the information that would be required in preparing proposals for addition to the standard.
As it is not possible for the NRSI to impose a mandatory requirement on other SIL entities that they submit registrations for all PUA character assignments, the registration process is voluntary. Nevertheless, it is in the best interest of SIL entities and of the Corporation as a whole that all PUA character assignments be added to the registry. Since the registry has been created in response to the unanimous action of all SIL entities represented at the November 1998 CTC, it is expected that SIL entities will treat the registration process as though it were mandatory.
The registry maintained by NRSI will include all of the information to be supplied with submissions (see below). It will also include other information provided by NRSI.
The registry will provide a mapping for every registered character in the PUA space of ISO 10646 plane 0 (the Basic Multilingual Plane, or BMP) to a unique codepoint in the plane 15 or plane 16 PUA space of ISO 10646. (For further information, see Hosken 1998.) These assignments will be made by the NRSI, possibly in co-ordination with other cooperating organizations (e.g. UBS). These character assignments in the larger PUA space of planes 15 and 16 (131,072 total codepoints) will provide a single and unambiguous encoding that can be applicable for all SIL language data, which can be useful for purposes of archiving and for exchange of data between entities that have conflicting character assignments in plane 0 PUA range.
Information to be submitted
The following items of information should be submitted with each registration of a PUA character:
- Name of person submitting character registration
- Name of SIL entity represented.
- Date of submission
- Sensitivity of information: an indication will be given regarding the level of sensitivity to associate with a given submission using the following categories:
- Information cannot be provided to anyone without permission from
- Information can be provided freely to the following entities
- Information can be freely provided to all
- Entity-PUA codepoint.
- Date added by entity to entity’s PUA character set (optional).
- Representative glyph; this can be submitted in the following forms:
- Font (TTF or Type1)
- Scanned image (300 dpi TIFF, BMP, PNG, or GIF). Archival quality TIFF or BMP grayscale Images with a resolution of 300 dpi are preferred . 300 dpi PNG or GIF are acceptable compressed formats but may not be acceptable for formal Unicode proposals.
- Name of character in local language/script (optional).
- Name of character in English or Roman transliteration.
- Typical orthographic realization or function: In most cases, the abstract character will have a direct correspondence to an element within one or more orthographies. In some cases, however, there may be an indirect relationship (e.g. the character represents a vowel quality which can have more than one orthographic realization, or which is realized by a combination of several orthographic elements), or the character may represent an abstract meaning (e.g. vowel length, or upper case). Where this is not fully clear from the character category (see preceding item), a complete explanation of the relationship of the character to orthographic elements and/or its function in representing a writing system (or multiple writing systems) should be provided. Examples, in a document using the submitted font or as scanned images, may be provided.
- Category of character (see Unicode general character categories).
- Combining class (can be expressed as a Unicode combining class or as a list of other characters with which it interacts typographically). Examples, in a document using a submitted font or as scanned images, should be provided.
- Character decomposition (optional – NRSI will research this for each character registration).
- Presentation form decomposition: if character is a presentation form of another character or of a sequence of other characters, the non-presentation-form decomposition (“normal form”) must be given. It would also be helpful to provide a brief explanation of the relationship between the presentation form and the normal form.
- Directionality (can be expressed as a Unicode bidirectional character type or as a prose description).
- Corresponding mirrored character (if applicable).
- Numeric value: indication must be given if the character represents a numerical value but is not a decimal digit. (See 4.6, Numeric Value—Normative.)
- Case mappings
- Upper case equivalent.
- Lower case equivalent.
- Title case equivalent.
- Exemplar languages using this character (optional – this will be useful if a proposal for standard allocation in Unicode is to be prepared).
- Encoding standards containing the character (optional – this will be useful if a proposal for standard allocation in Unicode is to be prepared.)
- Brief description of usage and acceptance of character within linguistic communities (optional – this will be useful if a proposal for standard allocation in Unicode is to be prepared).
- Availability of publications demonstrating usage of character (optional – this will be useful if a proposal for standard allocation in Unicode is to be prepared).
- Example sentence, in a document using the submitted font, containing the codepoint. Include the language name and an English translation. (optional – used for proposal purposes).
Hosken, Martin. 1998. PUA Corporate Strategy: A discussion on the organization of the PUA. Ms. (Available on the IPub Resource Collection 98 CD-ROM. Dallas: SIL.)
Language Software Board, SIL. 1999. LSB script strategy for SIL language software. Ms.
© 2003-2018 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.