NRSI: Computers & Writing Systems
NRSI Update #16 – December 2001
In this issue:
Some Impressions of NRTC2 by a Field Linguist
by Bill Jancewicz
The September 11 tragedies nearly derailed the second Non-Roman Technical Consultation, (NRTC2) held in Horsleys Green, England from September 17-22, 2001. The SIL Non-Roman Script Initiative (NRSI) had spent nearly two years preparing for this conference. Airlines were back in the air, and staff and participants were travelling by the weekend; NRSI team members resident in England went the extra mile taking care of arrangements on site as other team members arrived far later than they had expected.
Non-Roman Technical Consultation 2
This conference was held for computer consultants from various entities within SIL around the world, as well as partner organizations and software developers. We all had one thing in common: our work requires the handling of non-Roman scripts on various computer platforms. The administration of my own entity, the North America Branch, sent our computer consultant who routinely supports our Branch language and support teams with their computer and software concerns. So what was I doing there, just an ordinary working linguist from the Naskapi language project in Northern Quebec? Well, Naskapi uses a non-Roman script which is a subset of Cree Syllabics, and due to the remoteness of our allocation I was asked early on to be involved in developing strategies for using the Naskapi script on computers in the village. The solutions I came up with for Naskapi were later easily adapted to various dialects of Cree and Ojibwe, and after a dozen years on the field I found myself routinely called upon to assist various Aboriginal academic and cultural organizations across the north with their Canadian Syllabics scripts. When I received an invitation to NRTC2, I was provided the opportunity to represent the many Canadian Aboriginal organizations and Native Language speakers who would benefit from my attendance. Much of the support I am asked to provide often falls beyond the reach of our Branch computer consultant. So, I attended representing not only the North America Branch of SIL, but also users of Canadian Syllabics.
So, how can I characterize the main theme of NRTC2? Pretty much with just one word: Unicode. Until recently, many non-Roman script solutions for the computer have been based upon procedures that “bent the rules.” Because of a pressing need to put our non-Roman languages to work on the computers that were available, we developed fonts and keyboards that were non-standard (now referred to as “legacy” fonts), making the computer do things that the application and system designers never intended. All the while Unicode was in the works and eventually accepted as the standard encoding system that allows enough codespace to accommodate all the world’s scripts. And with the release of Windows 2000 and Macintosh OS X, Unicode support was finally integrated into the operating systems. The importance of acknowledging Unicode in our non-Roman script language projects was a large part of the conference.
NRTC2 was five intensive days of meetings with a well-planned balance of technical presentations, hands-on tutorials and workshops, the sharing of tips and assistance, and guest presentations. We received clear instruction about the various “smart font” technologies, presentations from the authors of Keyman and SILKey, and a member of the multilingual programming team from Microsoft. All of the discussions were underpinned with a solid foundation of Unicode. We learned some “font-fiddling” techniques based on ActivePerl, practiced writing the code for a custom keyboard utility with Keyman Developer, and were introduced to VOLT, Microsoft’s “smart font” Visual OpenType Layout Tool.
A featured presentation was the introduction to the latest version of SIL’s WorldPad, a true multilingual word processor, along with a tutorial on GDL, the Graphite Description Language for programming and rendering “smart fonts”, smarter still than Microsoft’s smart “OpenType” technology. Integration of Graphite is planned for the coming suite of SIL applications known as “ FieldWorks”.
I found all the sessions very interesting and helpful, especially since I had recently faced some script-related challenges in my own work with Canadian Syllabics. It was especially useful to come and gauge exactly where the “cutting-edge” of multilingual script technology lies (for the moment) and to evaluate the gaps between what the technology can do and the grass roots work of facilitating national Mother Tongue translators in actually using the technology on a practical level. I was very impressed by the expertise of the NRSI team: we are blessed with a group of extremely professional and helpful non-Roman script specialists in SIL. In my opinion this team provides SIL with an indispensable resource as we approach the script challenges of SIL’s vision for the next 25 years. The entire NRSI team is to be congratulated both in putting together an excellent conference, and for their day-to-day work in supporting the needs of non-Roman scripts in various locations around the world.
We Have the Technology
My impression was that nearly anything anyone needed to do with non-Roman scripts on computers that had been hoped for is finally possible today, or at least within the realm of possibility. The programming and the hardware is practically within reach to handle just about any script challenge that SIL and its partners will face in all of the languages of the world for years to come. But not every script solution will require the full level of the resources available. Let me explain. Just because the “high end” technology of WorldPad and Graphite will indeed work as a Unicode solution for a script like Canadian Syllabics for example, it does not necessarily follow that I should start using these technologies. In fact, since the Unicode codespace “block” has been already established for Canadian Syllabics based upon “dumb font” technology, I don’t even have to use OpenType smart font rendering to effectively use Canadian Syllabics on our local computer platforms. One thing that I have come away with is that simpler is usually better. Many SIL language programs will be able to get by using standard Roman fonts, for which there are ubiquitous commercial applications available. There are lots of “majority” languages, Russian, Polish, Spanish, French, etc, that can be handled with the current dumb font technologies. If there is a need for “smart font” rendering or right-to-left (RTL) applications, both Microsoft Windows and Apple Macintosh operating systems now have system-level “smart font” technologies that will do the job ( OpenType and AAT), using localized versions of their commercial software. If these commercial solutions fall short of the needs of a certain script, then SIL can handle it with Graphite smart font rendering technology and the applications that will come with FieldWorks, including WorldPad. However, further up the technology continuum you go, the fewer software applications are available. If I can get the job done with “dumb fonts”, then I will have a much wider range of applications available that will support my script.
One aspect that was brought out in the NRTC2 that deserves attention by nearly all SIL entities is that of Legacy to Unicode conversion procedures. As already noted, non-Roman encoding strategies in the years prior to Unicode were developed out of a need to “get the job done”, often at the cost of “breaking the rules” with regard to encoding standards. My own situation was no exception, nor was most of the non-Roman work done by SIL in years gone by. Many fonts for vernacular languages, phonetic scripts or Biblical languages followed an “in-house” encoding “standard”, in actual fact no standard at all. With the promise of Unicode offering a true universal standard, it becomes very important to begin to archive language data in Unicode, ensuring its usability later in years to come.
Also, during this transition phase to Unicode, there are many situations where routine access to data encoded using the older legacy encodings will be necessary, either due to the use of older computer systems and applications that are still unable to use Unicode (like Shoebox 5 and Paratext 5). Further, for any kind of electronic publishing that requires the use of non-Roman scripts on the World Wide Web, we have found that conversion of the language data to Unicode is by far the most effective means to provide vernacular language Web access. It may well be that publishing in minority languages on the Internet will be the most efficient way of working in some of SIL’s newest language situations. For these reasons, good software routines for the conversion of language data from legacy encodings to Unicode and back again must be developed. One way to do this is with a Visual Basic script, which was demonstrated during the conference. Consistent Changes tables or other utilities might also be needed. Developing a strategy for converting existing language data to Unicode will need to be considered for every language project that uses any “custom made” characters or diacritics.
For the linguistics and translation work that I am involved in for the Naskapi project and other Canada First Nations languages, we find that we do most of our work using just three primary applications: Microsoft Word for basic word processing and minor publishing, Paratext for translation, checking and revision, and Shoebox for text analysis, dictionaries and lexical work. For all three of these we have used Keyman 3.2 for keyboarding our non-Roman script. Although we have relatively capable Naskapi Mother Tongue Translators involved in the project, and the SIL team has a reasonable amount of computer savvy, we have not been successful in integrating some of the more complex language tools like the CARLA applications or “LinguaLinks” into our language program. We realize that for some language teams, the development of “FieldWorks” as a successor to “LinguaLinks” will truly meet their needs, especially with regard to providing the Graphite rendering technology in linguistic software. However, for the segment of the language programs we find ourselves in, just having our current primary software tools able to work with Unicode as soon as possible will meet our needs the best. For word processing, a Unicode solution is already in place with Windows 2000 and Word 2000. Together with the keyboard utility Keyman 5.0 released in 2001, we now have a very flexible means of keyboarding Unicode text into documents. At the NRTC2 I was shown a beta version of Paratext that has been re-worked to handle Unicode. Once this is released, the only component we would be missing to complete our language project’s transition to Unicode is Shoebox. I was very pleased to hear from the lead programmer of Shoebox at the NRTC2 that he is currently at work on a Unicode compliant version. Although the application will be re-named, and the software development is now the responsibility of a SIL entity in southeast Asia, this is very encouraging news. It now appears that all three of our most useful applications, Word, Paratext and Shoebox, will soon all be Unicode compliant.
A Book and a CD
Showing remarkable foresight, the NRSI team also provided two important resources for the participants to take along. A book and a CD-ROM. The book, a preliminary edition of Implementing Writing Systems-An Introduction, featuring articles contributed by members of the NRSI team, was edited by Melinda Lyons. This book, or rather the coming first edition, will be a welcome addition to any language team’s library, and an indispensable manual for orthography implementation for SIL and its partners. I would like to encourage the NRSI team to complete and publish the first edition, with the hope that it would become required reading for every language team, or even presented as a course in SIL schools. The CD-ROM comes filled with software resources, tutorials, manuals, and web links to all the current non-Roman script resources. In spite of the fact that software is always changing and presents a moving target, for those of us with limited web access, it is invaluable to have so many resources packed onto a CD.
On behalf of all of us working in non-Roman scripts, I want to say a heartfelt “well done” to the entire NRSI team for putting together a very helpful conference along with such high quality resources, and blessings as you continue to help language teams to solve their script challenges.
Graphite and WorldPad: Tools for Writing the World’s Other Languages
TechKnowLogia recently published an article with the above title written by Melinda Lyons. The abstract appears below. If you would like to read the complete article you can access it at: www.TechKnowLogia.org and select “Technologies Tomorrow”. You will have to register at the site to view the article, but it is a short and painless procedure.
Over 6,000 languages are spoken in the world, of which about 2,000 are in countries using non-Roman writing systems. Although computer tools have existed for some time to write the national languages in these countries, the tools often have restrictions that make them unusable for the minorities of these same nations. Graphite, and later WorldPad, was developed to provide smart font rendering capabilities and text editing that enables any combination of symbols when writing any script. The first use of this has been with the International Phonetic Alphabet, which is often used by linguists and others for learning these minority languages. Thus, Graphite and WorldPad provide tools for learning minority languages, as well as for literacy for those minorities.
Graphite and WorldPad mailing lists
by Greg Lyons
Graphite now has it’s own Website here. Its focus is on Open source development of Graphite and making extensible smart font rendering technology widely available.
There is also a new set of me for assistance.
Circulation & Distribution Information
The purpose of this periodic e-mailing is to keep you in the picture about current NRSI research, development and application activities.