You are here: Encoding
> Unicode > PUA
Short URL: http://scripts.sil.org/PUA_Procedure
Procedure for Registration of SIL Entity Private Use Area Assignments
(For Entity Portion of the PUA)
Peter Constable, 1999-05-14; 10410 reads
Introduction
PUA Character registration
Registration procedure
Download form
References
Introduction
At the Computer Technical Conference in November 1998, the following motion was passed unanimously:
WHEREAS we require text encoding standards for the purposes of data interchange and archiving, and
WHEREAS the Unicode Private Use Area is a limited Corporate resource,
MOVED to request the NRSI to develop and implement a plan for management of the Private Use Area and
Unicode Surrogate allocations which balances both Corporate-wide and entity-specific needs.
The primary concerns regarding the limited PUA space are the following: Giving complete freedom to SIL
entities in the utilization of the PUA space can result in haphazard and inefficient use of that resource, or
worse, in incompatibilities which would hinder exchange of data within the Corporation. On the other hand,
imposing a situation in which single body determines every aspect of PUA utilization would rob field entities
of flexibility and would delay their work. A better situation would be a compromise between the need to
coordinate use of the PUA and the desire to limit hindrances on field entities.
A basic strategy for this, outlined in Hosken (1998), has been
endorsed by the Language Software Board (cf. Language Software Board 1999, Section C.2 and Appendix E). That
plan allows for SIL entities to independently make PUA character assignments, according to their individual
needs, in the range U+E000..U+EFFF, and requests that they register such character assignments with a
central, coordinating body, the Non-Roman Script Initiative (NRSI). The plan also empowers the NRSI to make
character assignments in the range U+F000..U+F8FF for characters that they deem to be of potential usefulness
throughout the Corporation.
This document describes what the PUA character registration process consists of and provides a description
of procedures for SIL entities (or individuals) to follow in submitting registrations.
PUA Character registration
PUA character registrations made by individual SIL entities will be incorporated into an SIL registry of
PUA character assignments maintained for the Corporation by the NRSI. This endeavor, together with the
conventions adopted for utilizing the PUA space mentioned above, represents an introduction of new
bureaucracy into our Corporate operations which will place a greater demand on our limited resources. As a
result, it is reasonable to question what the intended benefits are. There are several purposes for PUA
character registration:
- It provides a means for documenting encodings used for SIL language data, which is useful for archival
purposes.
- It provides a means by which data from any source within SIL might be unambiguously interpreted in
terms of a reference set of character semantics.
- It results in greater consistency in encoding practices both within individual entities and throughout
the Corporation.
- It provides a means for allocating characters that are of potential use throughout the Corporation
(e.g. special phonetic characters) in a manner that will not result in conflicts with PUA characters used
by individual entities.
- It provides SIL entities with an independent, expert review of their PUA character assignments which
can catch potential encoding problems early on.
- Should an individual entity make a character assignment which may be of wider usefulness within the
Corporation, it provides a mechanism whereby such characters can be brought to the attention of the body
responsible for Corporate-wide character assignments, and duplication of character assignments can be
avoided.
- It facilitates identification of characters that merit formal adoption within the Unicode standard, and
provides some of the information that would be required in preparing proposals for addition to the
standard.
As it is not possible for the NRSI to impose a mandatory requirement on other SIL entities that they
submit registrations for all PUA character assignments, the registration process is voluntary. Nevertheless,
it is in the best interest of SIL entities and of the Corporation as a whole that all PUA character
assignments be added to the registry. Since the registry has been created in response to the unanimous action
of all SIL entities represented at the November 1998 CTC, it is expected that SIL entities will treat the
registration process as though it were mandatory.
The registry maintained by NRSI will include all of the information to be supplied with submissions (see
below). It will also include other information provided by NRSI.
The registry will provide a mapping for every registered character in the PUA space of ISO 10646 plane 0
(the Basic Multilingual Plane, or BMP) to a unique codepoint in the plane 15 or plane
16 PUA space of ISO 10646. (For further information, see Hosken 1998.) These assignments will be made by the
NRSI, possibly in co-ordination with other cooperating organizations (e.g. UBS). These character assignments
in the larger PUA space of planes 15 and 16 (131,072 total codepoints) will provide a single and unambiguous
encoding that can be applicable for all SIL language data, which can be useful for purposes of archiving and
for exchange of data between entities that have conflicting character assignments in plane 0 PUA
range.
Registration procedure
Information to be submitted
The following items of information should be submitted with each registration of a PUA character:
- Name of person submitting character registration
- Name of SIL entity represented.
- Date of submission
- Sensitivity of information: an indication will be given regarding the level of sensitivity to associate
with a given submission using the following categories:
-
- Information cannot be provided to anyone without permission from submitter
- Information can be provided freely to the following entities
- Information can be freely provided to all
- Entity-PUA codepoint.
- Date added by entity to entity’s PUA character set (optional).
- Representative glyph; this can be submitted in the following forms:
-
- Font (TTF or Type1)
- Scanned image (300 dpi TIFF, BMP, PNG, or GIF). Archival quality TIFF or BMP grayscale Images with
a resolution of 300 dpi are preferred . 300 dpi PNG or GIF are acceptable compressed formats but may
not be acceptable for formal Unicode proposals.
- Name of character in local language/script (optional).
- Name of character in English or Roman transliteration.
- Typical orthographic realization or function: In most cases, the abstract character will have a
direct correspondence to an element within one or more orthographies. In some cases, however, there may
be an indirect relationship (e.g. the character represents a vowel quality which can have more than one
orthographic realization, or which is realized by a combination of several orthographic elements), or
the character may represent an abstract meaning (e.g. vowel length, or
upper case). Where this is not fully clear from the character category (see
preceding item), a complete explanation of the relationship of the character to orthographic elements
and/or its function in representing a writing system (or multiple writing systems) should be provided.
Examples, in a document using the submitted font or as scanned images, may be provided.
- Category of character (see
Unicode general character
categories).
- Combining class (can be expressed as a
Unicode combining class or as a list
of other characters with which it interacts typographically). Examples, in a document using a submitted
font or as scanned images, should be provided.
Character decomposition (optional –
NRSI will research this for each character registration).
- Presentation form decomposition: if character is a presentation form of another character or of a
sequence of other characters, the non-presentation-form decomposition (“normal form”) must be given. It
would also be helpful to provide a brief explanation of the relationship between the presentation form
and the normal form.
- Directionality (can be expressed as a
Unicode bidirectional character type or as a prose
description).
Corresponding mirrored character (if applicable).
- Numeric value: indication must be given if the character represents a numerical value but is not a
decimal digit. (See
4.6, Numeric Value—Normative.)
Case mappings
-
- Upper case equivalent.
- Lower case equivalent.
- Title case equivalent.
- Exemplar languages using this character (optional – this will be useful if a proposal for standard
allocation in Unicode is to be prepared).
- Encoding standards containing the character (optional – this will be useful if a proposal for
standard allocation in Unicode is to be prepared.)
- Brief description of usage and acceptance of character within linguistic communities (optional –
this will be useful if a proposal for standard allocation in Unicode is to be prepared).
- Availability of publications demonstrating usage of character (optional – this will be useful if a
proposal for standard allocation in Unicode is to be prepared).
- Example sentence, in a document using the submitted font, containing the codepoint. Include the
language name and an English translation. (optional – used for proposal purposes).
Download form
References
Hosken, Martin. 1998. PUA Corporate Strategy: A discussion on the
organization of the PUA. Ms. (Available on the IPub Resource Collection 98
CD-ROM. Dallas: SIL.)
Language Software Board, SIL. 1999. LSB script strategy for SIL language software. Ms.
Backlinks (20 most popular; affiliated sites and popular search engines removed)
© 2003-2012 SIL International, all rights
reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us at .