NRSI: Computers & Writing Systems
NRSI Update #4 – February 1997
Welcome to issue #4 of the NRSI Update!
In this update:
NRSI Manager’s Report
by Margaret Swauger
The last couple of months have been an unprecedented time of illness and disruption for the NRSI team in Dallas. We’ve spent more time doing less than ever before! Despite the unpleasantness, God has given us encouragement in various forms. One big encouragement for me is that Lori Constable has joined the team as my Administrative Assistant. A brave woman! She is helping me and the rest of the team organize our mountains of documents and paper work, and respond to inquiries in a more timely manner.
My primary activity for March and April will be to write a 3-year plan for the NRSI to submit to the NRSI Advisory Board and project funders. I will interact with the fields and Areas concerned to try to establish potential needs over that time frame. We rely on project funding and need to give the funders a long range view of what we hope to accomplish.
If you have suggestions for the future of the NRSI, please let me know, or better yet, discuss them with others in your entity and give me some consensus. We are certainly at the beginning of developing the NRSI and look to the fields for direction in serving your needs.
The NRSI and Field-based Development
by Margaret Swauger
There have been a number of discussions and divergent opinions concerning the role of the NRSI in relation to field-based development. We’d like to clarify that role here.
We view the role of the NRSI, in relation to field-developed NR software, as that of consultant to the developer, and information provider to those looking for solutions. We recognize the value of field development and want to facilitate that development. We also want to communicate to those looking for such software the pros and cons of what has been developed so that others can decide if a solution will meet their needs.
We do not believe our role is to give approval of, or to veto, software development in SIL. We do want to make clear to individuals or groups looking for solutions what each piece of software offers. Many are working hard and quickly to add NR capability to their software. We want those considering incorporating field-based development into their own projects to see any pitfalls ahead of time.
If you have any questions or if you disagree with what I have written here, please let me know. It is important that we have a clear and common understanding of the role of the NRSI so that we can all work together to meet important NR needs.
Introducing the SDF-based Rendering Engine
by Timm Erickson
Some versions of Windows include special support for rendering (displaying and printing) a small number of specific non-Roman scripts. If the scripts you work with are not among those supported, or if you use multiple non-Roman scripts from more than one region of the world in the same document, you may find yourself greatly frustrated.
While some long-term and system-wide capabilities are being discussed and developed, some partial solutions to the non-Roman script issue havebeen developed. One of these is the “SDF-based Rendering Engine”, which will be discussed here. [The other, using the Consistent Changes program to render text, is discussed elsewhere in this NRSI update. - Ed.]
This engine is not a program on its own, and is not an operating system enhancement. It will work only with programs that are made or customized specifically to support it.
This engine was initially designed to meet the rendering needs of Arabic-style scripts. It has been used successfully to provide support for members of other script families, and may work for the scripts you use. Or it may not. See below for features and limitations.
What the SDF-based Rendering Engine does do:
What the SDF-based Rendering Engine does not do:
The programming for the rendering engine is already pretty much complete.
Shoebox 3.05 beta includes basic screen support for the engine, and in West Eurasia Group we have MTTs using it already. They have been able to enter data much more accurately using this system than when they try to use a Romanization of their language. I have already developed demos of the system using Arabic-like, Hebrew, Syriac, modern Assyrian, Hindi-like, and Cherokee scripts, and can send them on request.
The next version of LinguaLinks should also have support for the SDF system via the Transduced Fonts mechanism.
Using CC for Non-Roman Text Rendering in LinguaLinks
by Peter Constable
A recent addition to LinguaLinks (LL) has been the ability to use the Consistent Changes program (CC) for complex text rendering. The idea is to have data stored in one encoding, but then to pass each string through a CC table before it gets rendered on the screen or printer. The use of CC for this purpose will be included in version 1.5, though it may not be fully documented in that version. NRSI has been working with the LL staff to assist in evaluating issues relating to implementation of this use of CC in LL. In this regard, I have recently been experimenting with this capability in LL for rendering Thai text.
Thai script, like other Indic scripts, presents issues of contextualization and ordering of characters; these create problems for rendering on the one hand, and analysis (e.g. sorting) on the other. In experimenting with CC-enabled rendering in LL, I was able to find ways to address some of these concerns. I was generally pleased with the results, though I did encounter certain limitations and problems in implementation. Here is a summary of pros and cons:
The following are important limitations in this system:
The following are minor problems I encountered in the current implementation in LL:
This use of CC is now one of three devices available in LL for handling text rendering issues. For further details on our evaluation of this system, contact , and request the document “Transduced Fonts in LinguaLinks using CC”.
Apple’s NeXT OS
by Victor Gaultney
Late last year, Apple Computer announced that they were acquiring NeXTSoftware, Inc. and were going to use the NeXTstep/OPENSTEP environment as the foundation for future versions of the Mac OS. In the weeks since then Apple has outlined their plans for this new OS, the future of System7.x, and key international technologies. Their hope is to produce an OS that can truly compete with Windows NT - within one year.
The new OS - “Rhapsody” - will appear first in a developer release in mid-1997 that will be quite similar to the existing NeXTstep/OPENSTEP system. Based on a new version of the Mach microkernel (along with its Unix ties), Rhapsody will hold on to the best features of the NeXT OS -protected memory, preemptive multitasking, high performance, and add system-wide support for symmetric multiprocessing. It will, however, require major updates to current applications and may not include support for some current Mac technologies. Rhapsody’s “Premier” release for end users is scheduled for January 1998.
Although the current NeXTstep OS runs on many hardware platforms (including Intel), the main target for Rhapsody is PowerPC-based machines. Apple has not ruled out delivery on Intel platforms, but states clearly that the PowerPC is the priority. Even if an Intel version were to become available, it is highly unlikely that it would run Windows applications.
In addition, Apple plans to continue to upgrade the current MacOS 7.x with substantive additions every six months through 1998. In July, “Tempo” will include a revamped Finder (the biggest change in six years), OS-level Java support and a new version of GX that removes the greatest roadblock for some users - the GX printing architecture - while preserving the graphics and typography features.
Yellow and Blue Boxes
In mid-1998, Rhapsody and System 7.x will be unified in a single system that can run both old and new applications. Rhapsody “Yellow Box” applications will run as before, with all the benefits of memory protection, etc. System 7.x apps will run in a “Blue Box” - a fully hosted implementation of System 7.x within one Rhapsody memory space.
This means that 7.x applications, control panels and system extensions will run unmodified in the new OS. Apple has already shown applications such as Photoshop and Word 6 running within an early version of the “BlueBox” - at speeds comparable to current mid-range systems.
Although the current NeXTstep OS runs on many hardware platforms (including Intel), the main target for Rhapsody is PowerPC-based machines. Apple has said that delivery on Intel platforms is likely, but states clearly that the PowerPC is the priority. Such delivery would be two pronged: you would be able to run the whole Rhapsody OS (minus the Blue Box) on an Intel chip, or you could run individual Rhapsody applications on Windows NT. Neither case allows for running current Mac applications on Intel hardware, though.
Apple is very sensitive to their international market, so they have already stated that WorldScript (in some form) will be one of the few current technologies moved into Rhapsody. The NeXT OS is already based on Unicode but does not have a sophisticated rendering engine or locale resources, items that could be migrated from WorldScript. It is likely, however, that the format of those resources may change.
Rhapsody, like NeXTstep, will use Adobe’s Display PostScript (or a descendant) as the graphics model. QuickDraw GX will be dropped even though it is acknowledged to be a more powerful and sophisticated environment. Key features of GX, however, will migrate to Rhapsody. GX line layout and ColorSync color management will be added in quickly (to make up for DPS’s weaknesses), and Apple hopes to graft on other features (such as translucency) at some future time.
Impact on SIL
It is still unclear what impact this change of direction will have for our non-roman work. What is clear is that solutions developed for the current Mac OS 7.x will continue to be supported by Apple into 1999, and that Apple continues to take its international community very seriously.
The bad news here is that Rhapsody may require us to revise our solutions and tools to support new formats for fonts and international resources. It also may mean that few, if any, new GX-based applications will be written and the current ones may face only incremental updates (although Apple’s move to eliminate the GX printing architecture is targeted to encouraging more GX development).
The good news is that there seems to be no change in Apple’s global philosophy: the user should be able to use their preferred language, writing system and cultural preferences (date/time/etc.) anywhere in the system and throughout all applications. This means that we are likely to continue to see OS-wide support for multilingual computing, with enough flexibility to support minority languages and scripts.
In addition, the most important part of GX (for our purposes) - GX line layout - will be grafted onto Adobe’s PostScript environment. This link with Adobe will likely mean a greater competitive edge in the graphic arena and mean that international technologies (like WS and GX line layout) may gather more support in mainstream applications.
So at this point, the situation looks promising, but the real test will be to see whether Apple can deliver on their intentions quickly enough to hold onto their market.
Non-Roman Optical Character Recognition
by Dennis Drescher
Optical Character Recognition (OCR) is the process of scanning a printed document and then, through software, turning it into a computer document that can then be reproduced or manipulated. In the past, IPub has done some research in this area. The re-keying of text for producing reprints and revisions of long, multi-section documents published before the current generation of computer technology is a very labor-intensive task. It was hoped that OCR might offer some relief. However, the error rate was too high and the time saved over re-keying would be expended in editing.
This past fall I was asked to look into the feasibility of non-roman OCR. If it was feasible, the benefits of having NR OCR would be great for those who do linguistic work and other tasks involving non-roman scripts. The collection and analysis of that data would be greatly aided by NR OCR. There is much potential.
Because this research was initiated by someone working in a particular field we decided to use the script from that field for our research. The script was Devanagari, one of the more difficult scripts because of the headstroke line that joins characters together to form word units.
In the course of the initial hardware and software setup, I needed to contact the publisher of the software I was using — OmniPage Pro by Caere Corporation. As I spoke to the technical services representative and tried to explain the project I was involved in, I knew I was in trouble: the person on the other end of the line kept referring to non-roman scripts as “irregular” characters.
This world view seemed to be the basis for this software package. OmniPage Pro doesn’t seem capable of NR OCR. There is a training routine that you can initiate but if the material is not roman-based it yields bizarre results. It would seem, from my point of view, that the current architecture for digesting the graphical image and rendering the encoded text is based on the assumption that the text consists of distinct roman-based glyphs.
For NR OCR to be successful, the OCR software cannot be script specific. It must be capable of being trained at the individual glyph level. Another short fall I noted was that current OCR technology only supports a one-to-one glyph to code-point association. This is unworkable for many NR scripts that involve composite glyphs or large orthographies.
I am aware of an Arabic version of OmniPage Pro and this may be of interest to people using that script. However, as far as I know it is Arabic-specific and cannot be modified to work with any other scripts. At this point it would appear that NR OCR using commercial U.S. software is, for the most part, not yet feasible. However, I realize that my experience is limited. Maybe someone in the NRSI Update readership might have had some experience in this area. If so, would you be willing to share it with me? Are you aware of any software that can do NR OCR? Please write me and tell me about it.
Circulation & Distribution Information
The purpose of this periodic e-mailing is to keep you in the picture about current NRSI research, development and application activities.