Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: General
Short URL: http://scripts.sil.org/NRSIUpdate17

NRSI Update #17 – July 2002

Lorna A. Priest (ed.), 2002-07-01

In this issue:

Introducing Sharon Correll

by Sharon Correll

Strictly speaking I’m not really new to the NRSI, because I was “on loan” from the Language Software Development department for about three years (1998 - 2001) as part of the Graphite development effort. Non-Roman scripts proved addictive, and I am now officially a member of the NRSI.

My first job in computing was at my alma mater, the University of Delaware, developing computer-based instruction materials. Although this seems a far cry from working with non-Roman scripts, I had an opportunity to be part of a team that was creating an adventure game for foreign language instruction. During this project I learned the Prolog programming language and worked on parsing and correcting French grammar and spelling, and the concepts involved in this sort of work would turn out to be similar in many ways to what I later discovered in Graphite. I also completed a master’s degree at the U of D with a focus on artificial intelligence, and those studies exposed me to a lot of the ideas of non-traditional programming and conceptual modelling that is relevant to the kind of applications that SIL is developing.

I joined SIL around the time I was doing my masters’ and moved to Dallas in 1991 to start my first assignment in what was then the Academic Computing Department. I was a member of the CELLAR programming team, building the underpinnings of what would eventually become  LinguaLinks. More recently I’ve been helping with the latest generation of tools,  FieldWorks, and specifically I’ve done a lot of work on WorldPad, the text editor that can render complex scripts using Graphite.

Now that I’m full-time with the NRSI, I’m hoping to put some effort into improving the Graphite package (see related article). Eventually I’d like to develop smart fonts using Graphite and OpenType and develop more expertise in Unicode.

Not all my life revolves around programming. In my spare time I love doing music—leading worship at my church and singing with groups like the  Dallas Symphony Chorus and the chorus of  The Dallas Opera. My dream is to someday be involved in a music ministry using classical music. But until then, Graphite is a fun and rewarding project to be a part of!

Introducing Sheila Harrison

by Sheila Harrison

I was born in Illinois and lived in Hawaii and California before finally settling in Nashville, Tennessee. After living in Nashville for over 20 years, my husband, David, and I came to Dallas in 1991. David worked in the Academic Computing Department until his death in 1995. We came as guest helpers, and I became an STA/STM in 1996. Presently I am an MIT, waiting for membership status (when I’ll become real).

I served in the  International Literacy Department for ten years as Office Manager and Secretary, and the decision to seek another assignment was a difficult one. I joined the NRSI team last November as Project Manager. I am excited about my new assignment and am keeping busy learning about complex scripts, growing in my new role, and helping the NRSI team to stay on schedule.

Currently I am in a Management program at  Mountain View College, a local community college. When I’m not at work or at school I enjoy my small flower garden, cooking, and my six grandchildren.

International honors at bukva:raz! for Gentium, a new typeface by Victor Gaultney

by Peter Martin

We are pleased to announce that the NRSI’s Victor Gaultney has received recognition for his work on Gentium, an original typeface design he is developing as part of his MA in Type Design at the University of Reading, England. Encouraged by his professor, he entered the regular and italic faces in bukva:raz! (tr. “letter one!”), an international competition to identify the best 100 typefaces of the last five years. The panel of judges from around the world met in Moscow, December 2001 to select the winners: Gentium was placed in the top 100 (rankings were not given).

The design will get its own spread in the “Language, Culture, Type” book published next year by Graphis, and will be displayed in the exhibit at ATypI Roma this September, in exhibitions in Moscow and St. Petersburg, and at the General Assembly Building of the United Nations Headquarters in New York City, in early 2003.

Among Gentium’s distinctives are: original design; extensive set of Latin, Greek and Cyrillic glyphs with consistent design; excellent readability; economy of space suitable for long documents; capitals and ascenders designed to accomodate diacritic combinations. The sample below shows a selection of the extended Latin range:



From the ATypI website: “bukva:raz!, the international type design competition, is part of a special, tri-partite programme of the Association Typographique Internationale (ATypI) dedicated to the Year of Dialogue among Civilizations, 2001. The programme received the enthusiastic endorsement of Mr. Giandomenico Picco, Personal Representative of the Secretary-General for the United Nations Year of Dialogue among Civilizations, in May 2000, and is an official part of the global campaign for the Year of Dialogue coordinated by his office.”

Congratulations from the team, Victor!

Links:

Victor’s academic pages:  http://www.sil.org/~gaultney/

l’Association Typographique Internationale (ATypI) website:  http://www.atypi.org/

bukva:raz! information:  http://www.atypi.org/bukvaraz/

ATypI Conference 2002 in Rome:  http://www.atypi.org/rome2002/index.html

TECkit—new and improved

by Jonathan Kew

TECkit (the Text Encoding Conversion toolkit) is a system for defining and implementing the conversions or mappings between the custom 8-bit encodings used by our old “special character” solutions and the Unicode standard. The system consists of a simple language for writing these mappings, a shared library that implements the actual conversion process, and simple tools for applying conversions to plain-text and Standard Format files.

A preliminary version of TECkit was distributed on the CTC 2000 resource CD. Since that time, there has been considerable further development of the system, and TECkit version 2 is currently in testing. This version adds support for more complex encodings than could previously be implemented; in particular, it has better support for mappings where reordering is required, such as many existing systems for Indic/SE Asian scripts. It also removes the dependency on Perl to run the mapping compiler.

The package includes the information needed for developers to integrate the TECkit conversion engine into other applications; products such as Paratext and FieldWorks expect to take advantage of the TECkit engine to support import and export of data in legacy encodings. Sample code showing how TECkit can be used from Visual Basic and VBA (e.g., from within MS Word) is also included.

At the time of writing, the latest TECkit test release is available here.

Corporate Strategy for Transition to Unicode—Summary of Recommendations to the Language Software Board

by Peter G. Constable, November 7, 2001

We are quickly reaching the point in the development of commercial and SIL software at which it will be both practical and advantageous to use Unicode. Therefore, we need to plan now for how we will make a transition to working with Unicode. This document aims to set forth a strategy for that transition.

The following recommendations are being made:

  • Unicode applications. Software products developed using corporate resources should support Unicode. This should include all new products and new versions of existing products with the following possible exceptions: maintenance releases, interim products specifically planned as part of a transition process, or when an explicit decision has been made with approval of the Language Software Board to develop a product to support on-going use of legacy encodings.
    • Current Microsoft software has very good support for Unicode, and is expected to have relatively good support for complex-script rendering, except where private-use (PUA) characters are needed. Users should be encouraged to use Windows 2000 or later versions of Windows. We expect to be able to recommend the next version of Word for a large number of users. Those requiring complex-script rendering for PUA characters should use WorldPad for general word-processing needs.
  • Scope and priority. Because we are not only publishing, but eventually also archiving data, the scope of the transition will initially be to active language projects who will benefit from making the transition for their day-to-day work, then to all active projects, and then finally to all projects in which we have been significantly involved.
  • Requirements for transition. Accomplishing a transition will require attention in three main areas:
    • software (conversion tools, Unicode-capable language software applications)
    • writing-system resources (encoding conversion mappings, fonts, keyboards, PUA character semantics descriptions)
    • support and training
  • FieldWorks. A major attempt should be made to help the majority of potential FieldWorks 2 (FW2) early adopters make the transition ahead of the Translation Editor release so that the conversion issue does not hinder acceptance of FW2. We expect this attempt to succeed, but progress would be evaluated at the start of the last development milestone phase. Should we find we have not succeeded, we would extend the delivery date of the Translation Editor in order to add the ability to work directly with legacy encodings in FW. This decision would be ratified by the LSB.
    • It should be noted that this strategy may be interpreted as requiring an exception to several of the LSB’s Usability Requirements.
  • Writing-system resources for FW2. NRSI will coordinate an attempt to see that Unicode-capable writing-system resources (mappings, fonts, etc.) for a majority of potential early adopters are available prior to delivery of FW2. NRSI will endeavour to contact every entity and to assist them in this effort. They will focus as much as possible on providing training and consultation for field support personnel, but will produce the resources where needed.
    • Field entities have a critical role in success of this strategy since information about legacy encodings can only come from them. The LSB Area representatives (whether through Area administrations or by direct communication with entities) must convey to the field entities the strategic importance this process and the need for entities to take responsibility to see that the resources needed for their teams to work with Unicode are being developed.
  • Character encoding conversion tools. A software package for character encoding conversion should be completed and released as soon as possible. This will be done using resources under the direction of the SDMT.
    • This package should include an encoding conversion component that can be used by the stand-alone conversion tools in this package, or by other SIL applications. It should be able to perform conversion on plain-text data or on SFM data (with the ability for different SFM fields to undergo different conversions). This package should include stand-alone tools for converting plain-text and SFM data files as well as for converting data on the clipboard.
  • Conversion of LinguaLinks data. The SDMT should see that LinguaLinks be revised to support the ability to convert character encoding of data to Unicode as part of its XML process, and also when placing data on the clipboard. This conversion should be done using the conversion engine mentioned in item 6. (Conversion for XML export is needed to generate valid XML data.)
  • Conversion of Shoebox and Paratext data. While it would be possible to revise Shoebox and Paratext to provide conversion to Unicode as part of file export or clipboard operations, this could also be done for SFM data files and, in many situations, for clipboard data using external utilities described in item 6. We recommend that such revisions not be made to Shoebox.
  • Conversion of Word data. The SDMT should see that a Word add-in is developed that uses the encoding conversion engine described in item 6 to convert data in Word documents.
  • Future of CC. The LSB should establish a committee to evaluate the future of general text-processing tools (in the genre of Consistent Changes) within the corporation. This committee should assess the likely future need for such tools, taking into consideration the full range of current uses of CC and the options with regard to Unicode-capable tools, and report back to the LSB with recommendations.
  • Typesetting. It is recommended that IPub and NRSI continue to research long-term solutions for scripture typesetting using Unicode. In the interim, IPub should identify solutions for bringing Unicode-encoded data into existing typesetting processes.
  • Training. Before the end of 2001, the SDMT, NRSI and JAARS Computer Training department should examine the need to provide Unicode-related training for field-support staff specifically in order to meet the short-term need of developing writing-system resources as part of the effort described in item 5.
    • The JAARS Computer Training department should plan to provide a training focus on Unicode for the recurrency sessions prior to CTC 2002. NRSI should work together with them in preparing and teaching these sessions.
    • The JAARS Computer Training department should assess future needs in relation to the Special Characters component of the computer orientation program and should work together with NRSI in preparing new curriculum.
    • The JAARS Computer Training department should assess the need for training materials related to Unicode-based solutions for writing system support to be included with FieldWorks or other SIL software products.
  • Archiving. Recommendations should be made to various entities with regard to the use of Unicode in archiving of language data:
    • The SIL International Administration is recommended to ensure that the Language and Cultural Archives department is provided with the resources needed to establish processes for using Unicode in the archiving of language data, to document encodings and Unicode character mappings for currently archived data, and also to convert currently archived data as deemed useful by that department.
    • Field entities are recommended to develop policies and procedures for archiving language data using Unicode.
    • SIL Area administrations should ensure that entities within their area that close operations have defined Unicode character mapping tables covering all of their language projects prior to closing. Area administrations should also encourage those entities to submit data to corporate archives encoded in Unicode.
  • SDF. Unless the fallback plan described in item 4 becomes necessary, International resources should not be used for further development of SDF technology, and SDF should not be supported in new SIL language software products.
  • Macintosh. It is to be noted that there is currently no indication of a significant selection of Macintosh business applications becoming available that will support Unicode and complex-script rendering. Except for users working with writing systems that do not require complex-script rendering or that do not require more than SIL software (assuming the future arrival of Mac versions of FieldWorks applications), no migration path to Unicode can be recommended at this time.

Roman Font Strategy—The Future of Encore Fonts

by J. Victor Gaultney

New technologies require revised tools and new strategies. The technical advancements in operating systems and software, such as Unicode, require us to rethink how we want to meet the need for fonts for the millions that use writing systems based on Roman and Cyrillic alphabets.

Planning for this next generation began over a year ago and began to take shape last fall. Since then, intensive work has begun, with the hope of releasing at least one ‘next generation’ font by May 2003.

The technical requirements

A new generation of Encore Fonts will need to meet our needs for many years into the future. To be adequate for the wide range of needs (publishing, literacy, linguistics, translation, electronic publishing), a revised Encore package would need to:

  • Be encoded according to the Unicode standard
  • Contain all glyphs needed for Roman-based writing systems, both orthographic and phonetic. This includes both base glyphs and diacritics, as well as glyphs for which Unicode characters have not yet been assigned (these would go in the PUA).
  • Contain all glyphs needed for Cyrillic-based writing systems. Because of the prevalent cross-over between Roman and Cyrillic lettershapes (use of Roman glyphs to extend primarily Cyrillic typefaces, and vice-versa), especially in minority languages, a full set of Cyrillic glyphs (including extensions) is needed.
  • Contain smart font code (Graphite/OpenType/AAT tables) when the writing system requires it for correct display, such as for diacritic positioning.
  • Be usable on a variety of OS and application platforms. Although Windows is clearly the primary environment in which they will be used, potential use on Macintosh, Linux and other systems should be kept in mind and not hampered.
  • Be available in a variety of typeface styles and families, suitable for a wide range of publishing needs. At a minimum, all the existing Encore font families are needed. Additional families may be needed to meet publishing needs, especially electronic publishing and the web.

A streamlined strategy

With all this in mind, a new strategy has been prepared to meet these needs as quickly and simply as possible. It centers around the development of ‘global’ fonts that, thanks to Unicode and smart font technologies, will meet 90-95% of Roman and Cyrillic needs. No longer will we need to have SIL Doulos Cameroon, SIL Doulos PNG, SIL Doulos Mexico, etc. Most of us will only need one font: SIL Doulos. Yes, a few special-purpose fonts will be necessary, but only for those with very unusual/complex needs.

The strategy also aims to release fonts as soon as possible in the development process, in order to make them available for use. So the strategy has three phases:

Phase I - Provide a single global Roman/Cyrillic font for Unicode transition

This phase will deliver a single smart font, based on SIL Doulos-Regular, that will meet 90-95% of Roman/Cyrillic needs around the world. This will require revisions to existing glyphs and a large number of new glyphs in order to support important Unicode ranges. It will also require development of tools related to design automation, information management, smart font code development and testing.

Begun: Jan 2002. Beta test: Nov 2002. Release: May 2003.

Phase II - Complete remaining font families and developer tools

This phase will provide a larger set of smart Roman/Cyrillic fonts, including other styles (italic, bold) and other typeface families (monospaced, sans-serif, publishing), similar to our current Encore 3 fonts. It will also complete development of first-generation font tools needed for making the fonts work for a wider variety of languages.

Start: Nov 2002. Beta tests: mid to late 2003. Release: July 2004.

Phase III - Further broadening and refinement of fonts and tools

This phase will increase the fonts available and make the font tools easier to use and understand.

Start: mid-2004. Release: unknown.

And what about TypeCaster?

TypeCaster, and other associated tools, have been needed because of the plethora of different font encodings in use, and to construct composite (base + diacritic) glyphs. With Unicode and smart font technologies, the need for a font compiler is nearly eliminated. Unicode specifies exactly how the glyphs ought to be encoded. Smart fonts use built-in rules to handle diacritic positioning, etc. The fonts will not require compilation or field modification!

So, there is no plan to revise TypeCaster, or replace it with a similar tool. In the plan above, there is talk of ‘font tools’, but these will initally be very rough, textual tools used by the Encore Fonts development team. Development of special-purpose fonts would also require these tools, but such development would be done in close coordination with the NRSI Roman Font Team.

New Font tool—FontLab

by Martin Hosken

It’s not often that font designers get new toys to play with, but at last there is a tool that has the capability to really help font designers to become more productive.  FontLab has been developed over a number of years and is now at version 4. It is a powerful font design package which allows you to do many things not available in other packages, or for which you would have to use a suite of tools. You can do hinting, TrueType point control, composite creation, and all kinds of things. The most exciting feature is that FontLab is now scriptable using the Python language which opens up all sorts of possibilities for allowing the font design process to integrate into other activities necessary in font production. For example, we have a whole database of information used in deciding which glyphs are needed in the single global Roman Font (mentioned in Phase 1 above). The scripting capabilities allow information flow back and forth with that database, saving a poor font designer’s fingers and wrists!

FontLab is certainly a high end tool, at a high end price. It also has that high end feeling of having lots of power at your fingertips but complete confusion over how to get at it. And many have been reluctant to switch from good old Fontographer because of that. And there is certainly no reason or expectation that anyone using such a tool should need to change. If you are using Fontographer and are happy with it, please don’t feel any pressure to change. Switching is not without a learning curve. Having said that, those who are using FontLab seem to be happy with it, bugs not withstanding (here is Victor Gaultney’s perspective:  http://www.sil.org/~gaultney/FogFL/).

New characters for Unicode

by Jonathan Kew

Two NRSI proposals for additions to the Arabic script block of Unicode have recently been accepted by the Unicode Technical Committee, one in November 2001 and one in February 2002. The first of these proposals was for three additional letters used in some EEG projects; the second was for a variety of diacritics and other marks used in some South Asian languages. The characters are listed (along with other characters from a number of other sources) in the Unicode “Proposed New Characters: Pipeline Table” at:  http://www.unicode.org/unicode/alloc/Pipeline.html

The acceptance of these two proposals should be seen as an encouragement not only to the particular language projects where these characters will be used, but also to others. It confirms that, given appropriate supporting documentation, we can have characters needed for minority languages added to the Unicode standard. This means that as fonts and applications are updated to support newer versions of the standard, these characters will be increasingly widely supported, and the minority communities that need them will be able to use their own languages in standard, mainstream computing products for the first time.

Graphite Development News

by Sharon Correll

Graphite development has been fairly quiescent for the last year and a half, with the exception of a few bug fixes here and there. But we are expecting that will change in the near future and we will be able to begin adding some new functionality to the system. Here are some ideas the Graphite development team is discussing:

  • DeCOMize. Graphite is currently implemented as a COM interface, but removing the interactions with COM would make it more attractive to the open-source community and would be a preparatory step to porting the system to another platform, such as Linux.
  • Vertical support. Vertical rendering is used for complex scripts like Mongolian and Sign Writing. Adding vertical support would involve not only extending the Graphite engine but also adding vertical paragraph layout to the WorldPad text editor.
  • Unicode surrogate support. Currently it is theoretically possible to handle Unicode surrogates (two 16-bit words forming a single character) by means of the Graphite substitution process. But it would be greatly preferrable for such support to be built directly into Graphite.
  • Bidi enhancements. To be fully Unicode compliant, the Unicode bidi algorithm implemented in the Graphite engine should be enhanced to handle embedding and override characters.
  • Justification. Justification is needed to create layout that is intended to approach publication quality. Techniques for justification vary widely from simple inter-word spacing to the addition of kashidas and glyph ductility (stretching). The original intention was that Graphite could support at least the first two of these, but more analysis and design is needed.
  • Normalization. It could be useful to add a layer to automatically impose Unicode normalization on the character data before rendering. This would prevent Graphite font developers from having to support many arbitrary character orderings.
  • Fix rounding errors. We’ve discovered some rounding errors in the Graphite engine that result in the glyph positions being off by a pixel or so from their expected locations.
  • Optimization. It’s possible that there are ways that we could improve Graphite’s performance, so it may be worth spending some time analyzing the engine to evaluate what is possible in this area.
  • Font file reorganization. It may be useful to allow the Graphite tables to be located in a separate file from the actual font. This would allow adding Graphite support to fonts that cannot be modified due to licensing restrictions.

There are also a few Graphite fonts that are under development.

  • IPA - a beta version of a Unicode IPA font (SIL Doulos) with Graphite support is available from Martin Hosken.
  • Khmer - under development by Maurice Bauhahn; see  http://www.bauhahnm.clara.net/Khmer/Welcome.html.
  • Arabic - under development by Bob Hallissy; a date has not yet been set for the release of the font.
  • Burmese - under development by Martin Hosken.
  • Roman - the complete Roman Unicode font to be released in 2003 will include some Graphite support (see related article).

If you are interested in the future of Graphite, please sign up on our e-mail discussion lists by visiting Graphite. There are three lists that may be of interest:

  • GDL: discussion of Graphite’s features and behavior, particularly of interest to smart font developers.
  • Graphite-devl: discussion of Graphite’s implementation and architecture, for participants in the open-source development process.
  • Graphite-announce: significant Graphite news (receive only).

Circulation & Distribution Information

The purpose of this periodic e-mailing is to keep you in the picture about current NRSI research, development and application activities.


© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.