This is an archive of the original scripts.sil.org site, preserved as a historical reference. Some of the content is outdated. Please consult our other sites for more current information: software.sil.org, ScriptSource, FDBP, and silfontdev



Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE | PRIVACY POLICY

You are here: General > WSI Guidelines
Short URL: https://scripts.sil.org/WSI_Guidelines_Sec_5_3

Guidelines for Writing System Support: Technical Details: Characters, Codepoints, Glyphs: Part 3

Peter Constable, 2003-09-05

5.3  Keystrokes and codepoints

Input methods represent the third of three basic components for working with text data on a computer that were introduced earlier. In general, input methods can include things like voice- or handwriting-recognition. Keyboards are the most common form of input, however, and also the only one that is easily extended or modified. This discussion will therefore focus on keyboard input.1

5.3.1  From keystrokes to codepoints

Just as codepoints and glyphs are the counterparts to characters in the encoding and rendering components, keystrokes are the counterpart to characters in the keyboarding component. Whereas characters (or codepoints) get transformed into glyphs in the rendering process, keystrokes are transformed into codepoints in the input process.

All computer operating systems include software to handle keyboard input, and many provide more than one keyboard layout; that is, more than one set of mappings from keys to characters. Many keyboard input methods use a strictly one-to-one mappings from keystrokes to characters: for each keystroke, there is one particular character that is generated.2 But some keyboards provide alternative mappings based on a previously-entered “dead key” (a key that does not enter a character, but rather changes the character entered by the following key). For example, typing “`” followed by “a” to get “à”, but “`” followed by “o” to get “ò”.

Just as the mapping from characters to glyph might involve complex, many-to-many mappings, the same is potentially true for keyboard input. For example, it would be possible to have a single keystroke that generated a sequence of several characters, such as <n, g, b>, or <GREEK SMALL ALPHA, COMBINING ROUGH BREATHING MARK, COMBINING ACUTE, COMBINING IOTA SUBSCRIPT>. Similarly, it would be possible for a sequence of keystrokes to generate single characters, perhaps with each keystroke in the sequence changing the previous output.

Figure 10: Possible keystroke sequences mapping to single characters (hypothetical)



Different input methods might generate exactly the same characters, though in different ways. For example, one may use a single keystroke to generate a given character, while another uses two or more keystrokes (the first, perhaps, being a dead key) to generate that character.

Figure 11: Two input methods: same character, different number of keystrokes



Complex input methods can map a single keystroke to different characters, depending on context. For example, in the case of Greek sigma, an input method may use a single key, such as the  s  key, to enter all forms of sigma, with the input method generating either a FINAL SIGMA or NON-FINAL SIGMA according to the context:



Figure 12: Greek sigma: keyboard input of presentation-form-encoded data



Note that, when the  s  key is pressed, the sigma is at that point word-final. It is the next character that is entered that determines whether or not the sigma will remain word-final. If another word-forming character is entered, then the FINAL SIGMA is changed to NON-FINAL SIGMA.3

The point to see in this discussion of input methods is that the mapping of keystrokes into characters is potentially a complex process involving many-to-many mappings, just as in the rendering process. Users familiar with systems designed to support English would certainly be familiar with keyboards that use one-to-one mappings only. But keyboard processing need not be limited in this manner, and many keyboard implementations are not.

Copyright notice

(c) Copyright 2003 UNESCO and SIL International Inc.



Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.



1 It will not, however, cover systems for Far Eastern scripts (Chinese, Japanese, Korean) that typically require sophisticated input method editors (IMEs).
2 We will use the term keystroke to refer to the pressing of any basic (non-modifier) key in combination with zero or more modifier keys. By modifier keys, we mean keys such as  Alt ,  Control ,  Shift ,  Alt-Graph ,  Option , and  Command .
3 As to be expected, there are many possible implementations for a Greek keyboard. Another implementation might provide separate keystrokes for final versus non-final characters.

© 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Read our Privacy Policy. Contact us here.