WSTech: Writing Systems Technology (formerly known as NRSI)
Guidelines for Writing System Support: Technical Details: Smart Rendering: Part 3
UNESCO project Initiative B@bel
A complete index of all SIL's contributions to UNESCO‘s project Initiative B@bel can be found here.
Guidelines Table of Contents
9.3 Glyph Processing—Smart Fonts
Although many good WSIs have used the dumb fonts approach, it has limited usefulness in modern systems. The best opportunities for solid WSI development rest in the use of smart font technologies.
Smart fonts are those with a more complex glyph processing model. From this simple definition a whole plethora of approaches is possible. The skill in creating a glyph processing model is to focus on converting the original string of glyphs into a new string of glyphs, where each glyph is appropriately positioned. There are a number of basic principles common to all the different technologies, though they differ a little in each particular implementation. The three main TrueType-based smart font rendering technologies are
This discussion will follow the Graphite processing model most closely while interacting briefly with the other two technologies.
In smart glyph processing, the glyph stream goes through two key phases with the possibility of a third. The first phase is called substitution and is concerned with ensuring that the glyph string consists of the right glyphs in the right order in the string. Thus it involves such concepts as glyph replacement and re-ordering. The second phase—positioning—is concerned with ensuring that the glyphs are correctly positioned. It involves such concepts as kerning and shifting. The last phase—justification—may or may not be used. It involves working with the application to increase or decrease the space taken up by the string when it is rendered. It involves such concepts as kashidas and maximum allowable space at various points through the string.
Figure 2: Smart glyph processing overview
The substitution phase involves transforming the input glyph string to ensure that it contains the right glyphs in the right order. Thus all the approaches of normal string transformation are needed. The following capabilities are particularly required:
9.3.2 Bidirectional Ordering
The whole issue of rendering right to left scripts is fraught with difficulty. Not only do we have to consider which direction a string is to be rendered in, but also the wider directional context in which the string is to be rendered. Thus rendering a left to right string in a right to left paragraph is a very different proposition from rendering it in a left to right context. In addition, in Arabic, for example, numbers are written left to right. Thus, not only might the string need to be rendered in a different order, but parts of the string may need to be rendered in a different direction to other parts.
The real complexities in the issue appear when it comes to line breaking. Consider some Hebrew text embedded in an English paragraph and a line break occurring in the middle of the Hebrew. The Hebrew character nearest the last English character before the Hebrew will be very different if there is a line break as compared to if there is no line break.
Figure 3: Bidirectional line-breaking examples
To help in understanding which order we are talking about at any time, we talk of logical and surface order. Logical order is the same as the order in which the encoded string is stored, that is, in reading order. Thus the first letter of a sentence precedes the second regardless of where they are positioned relative to each other.
Surface order is the order in which glyphs are rendered and is system dependent. For a system that always renders left to right, then surface order may be the opposite order to logical order. Most systems use the overall writing system direction for the surface order. This, for the most part, means that glyphs are rendered in logical order. But there may be glyphs which are rendered in the opposite order to logical order, for example, Arabic numbers.
AAT and OpenType process everything in surface order and address the line breaking issue outside the rendering issue. Graphite processes the substitution in logical order and then processes the positioning in surface order, which for Graphite, is the underlying writing system order. Thus in Arabic, the numbers would be processed for substitution in reading order and then positioned in reverse reading order.
The input to the positioning phase is a glyph string in the order in which the glyphs are to be rendered. The default activity is to render the glyphs as though this were a dumb renderer with the glyphs lined up in boxes each with the width of the advance width. The task of the positioning phase is to move various glyphs from their default position and perhaps to close up any gaps left behind.
More sophisticated applications, even with dumb rendering systems, allowed for kerning. Kerning adjusts the relative position of two glyphs and also adjusts the positions of all following glyphs. Thus the classic example of WAVE.
Figure 4: Kerning
Kerning is a key concept in positioning, but it may be extended to cross-stream kerning whereby there is movement in the vertical direction as well as the horizontal . Using this concept and the ability to reset the vertical direction to the baseline again, it is possible to do some fairly sophisticated positioning. AAT provides very sophisticated contextual kerning as the basis of its positioning.
In addition to kerning, there is the concept of shifting, which is like kerning, but does not involve any other glyphs moving. Thus if we were to shift the “A” in “WAVE” rather than kern it (and leave the “V” unkerned and unshifted) we would see the following:
Figure 5: Shifting
This capability is particularly useful for positioning zero-width diacritics.
184.108.40.206 Attachment Points
The most powerful way of positioning things like diacritics is the concept of attachment points. Consider the problem of trying to attach an acute accent over an “A”. There are a number of ways of achieving this.
The latter approach is very powerful and can be used even when there are no attachment points actually designed into the glyph. All the positioning phase needs to know is where the attachment points ought to be, even if they do not actually exist in the font file. The advantage of actually drawing the attachment points into the glyph is that then they can be hinted and the diacritic attachment will then be correctly positioned according to hinting1.
The concept of attachment points is so powerful that it forms a core concept in Graphite where it is used to help address the issue of allowable cursor points and the ability to measure existing positioning to be used in positioning other glyphs. OpenType allows positioning by attachment. AAT does not have this capability.
The positioning phase needs to address such issues as:
Having positioned the glyphs, the primary role in glyph processing is completed, and the data can be handed over to the renderer for final rendering.
But there are other issues that need to be addressed. What happens if the text over-fills the space allocated for it? How should line breaking be handled, particularly in scripts with no inter-word spaces? What about justification in scripts that use kashidas, or do not have much whitespace to widen? What about features?
Justification is done in much closer conjunction with the application than simple rendering. The first two phases can be achieved by passing an appropriate string to the smart renderer and leaving it to do all the work. Justification requires the application to make some decisions about where extra width is to be generated or where width must be reduced.
In order to achieve this, the smart font needs to both provide information to the application about good places to vary width and by how much, and then to make the necessary changes when the application specifies width changes at certain places. Width changes may be achieved by a number of approaches
Each of these techniques provide ways of changing the width which have greater or lesser impact on the reader. Clearly, it is best to make adjustments which minimize the impact on the reader where possible. But which ones are those? For this reason, smart fonts will return some kind of weighting as to how welcome width adjustment is at each location in the glyph string.
9.3.5 Line Breaking
Line breaking can be very simple (after any space) or very complicated (using neural networks). The normal approaches used by applications for handling line breaks are very limited. For example, breaking a line at a particular position may cause the line before to grow and so the new line may still be too long. Not only this, but creating a line break at a position may change the rendering of the lines before and after.
This brings out an important change for applications between dumb rendering and smart rendering. It is not enough to pass text to the rendering system character by character because the rendering system may need to change how a character is rendered dependent upon the characters around it. Therefore it is necessary for the rendering system to pass appropriate lengths of text to the renderer (whatever they might be!) This also means that an application can’t just add characters to a line until it fills up and then move on to the next line. Instead it is almost easier to have a paragraph layout subsystem that interacts with the renderer in a generic way. This then leaves the question of where line breaks might occur to the writing system and so, ultimately, to the font.
So, one of the early phases of processing is to decide on valid line-break positions. Uniscribe does this in the Microsoft context, ATSUI does line breaking in the AAT context. Both of these use knowledge of Unicode character attributes to decide on line break points. Graphite, on the other hand, does line breaking using glyph attributes and allows them to be changed contextually. This has the disadvantage that much of the Unicode character attributes need to be reflected into the font as glyph attributes, but does allow for more sophisticated language based contextual control.
The features mechanism allows a particular run of text to pass information to the glyph processor about how the string should be rendered. For example, a feature may be used to indicate whether certain ligatures should be made or not. Each feature has a name and also a number. The name allows users to interact with the feature and the number allows software to interact with the feature. Features can take various values, each with their own name. A user may set a particular feature to a particular value for a run of text, and the glyph processor can use the knowledge of the value of a feature to change how it processes the glyphs.
OpenType, on the other hand, uses the feature concept in a very different way. Firstly, features are a binary concept, a feature is either set or not set. Secondly, they are used as a way for the higher level processor (Uniscribe) to pass extra information along with a glyph ID to the font for processing. For example, Uniscribe converts Arabic characters to the nominal form and then marks whether a glyph is word initial, medial, final or isolate using a feature for each. Thus the string passed to the font for processing has different features set for different glyphs.
(c) Copyright 2003 UNESCO and SIL International Inc.
Note: If you want to add a response to this article, you need to enable cookies in your browser, and then restart your browser.
Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.