Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: General
Short URL: http://scripts.sil.org/NRSIUpdate04

NRSI Update #4 – February 1997

NRSI staff, 1997-02-01

Welcome to issue #4 of the NRSI Update!

In this update:

NRSI Manager’s Report

by Margaret Swauger

The last couple of months have been an unprecedented time of illness and disruption for the NRSI team in Dallas. We’ve spent more time doing less than ever before! Despite the unpleasantness, God has given us encouragement in various forms. One big encouragement for me is that Lori Constable has joined the team as my Administrative Assistant. A brave woman! She is helping me and the rest of the team organize our mountains of documents and paper work, and respond to inquiries in a more timely manner.

My primary activity for March and April will be to write a 3-year plan for the NRSI to submit to the NRSI Advisory Board and project funders. I will interact with the fields and Areas concerned to try to establish potential needs over that time frame. We rely on project funding and need to give the funders a long range view of what we hope to accomplish.

If you have suggestions for the future of the NRSI, please let me know, or better yet, discuss them with others in your entity and give me some consensus. We are certainly at the beginning of developing the NRSI and look to the fields for direction in serving your needs.

The NRSI and Field-based Development

by Margaret Swauger

There have been a number of discussions and divergent opinions concerning the role of the NRSI in relation to field-based development. We’d like to clarify that role here.

We view the role of the NRSI, in relation to field-developed NR software, as that of consultant to the developer, and information provider to those looking for solutions. We recognize the value of field development and want to facilitate that development. We also want to communicate to those looking for such software the pros and cons of what has been developed so that others can decide if a solution will meet their needs.

We do not believe our role is to give approval of, or to veto, software development in SIL. We do want to make clear to individuals or groups looking for solutions what each piece of software offers. Many are working hard and quickly to add NR capability to their software. We want those considering incorporating field-based development into their own projects to see any pitfalls ahead of time.

If you have any questions or if you disagree with what I have written here, please let me know. It is important that we have a clear and common understanding of the role of the NRSI so that we can all work together to meet important NR needs.

Introducing the SDF-based Rendering Engine

by Timm Erickson

Some versions of Windows include special support for rendering (displaying and printing) a small number of specific non-Roman scripts. If the scripts you work with are not among those supported, or if you use multiple non-Roman scripts from more than one region of the world in the same document, you may find yourself greatly frustrated.

While some long-term and system-wide capabilities are being discussed and developed, some partial solutions to the non-Roman script issue havebeen developed. One of these is the “SDF-based Rendering Engine”, which will be discussed here. [The other, using the Consistent Changes program to render text, is discussed elsewhere in this NRSI update. - Ed.]

This engine is not a program on its own, and is not an operating system enhancement. It will work only with programs that are made or customized specifically to support it.

This engine was initially designed to meet the rendering needs of Arabic-style scripts. It has been used successfully to provide support for members of other script families, and may work for the scripts you use. Or it may not. See below for features and limitations.

Features

What the SDF-based Rendering Engine does do:

  • Offers Arabic-style contextualization support, in a way that makes it possible to support other West Eurasian scripts like Hebrew, Greek, Syriac, and modern Assyrian.
  • Offers simple ligature support. This has made it possible to support rendering for Western Panjabi (in an Arabic-like script), Modern Assyrian, Cherokee syllabary and Eastern Panjabi (in a simple Hindi-like script) successfully.
  • Offers basic support for cursor positioning to the calling application.
  • Offers easy customization. Script Definition Files (SDF’s) are ASCII text files which can be edited with any text editor. Also, an SDF Editor is being made to make the process of creating/maintaining script definitions easier, faster and more intuitive. If the rendering features of your script fit within the limits of this system, then it should be a fairly straight-forward task to make your script work with this system.
  • Allows for linguistically useful encoding of the underlying data. For example, a given letter or diacritic is always encoded the same way and the rendering engine determines the correct contextual form for display.
  • Runs on any version (US or localized) of Windows, version 3.1 or greater.

What the SDF-based Rendering Engine does not do:

  • Does not offer any right-to-left (or vertical or even left-to-right) support. That must be programmed by the application involved. (This is why it has not yet been made to successfully support multi-line right-to-left text in Microsoft Word. It’s Word’s fault, not the rendering engine’s!)
  • Does not offer a user interface. It cannot be used unless called by an application or by a macro in Microsoft Word (or Excel, etc.).
  • Does not support glyph repositioning or stretching.
  • Does not get around the 223-glyph limit of Windows fonts. It cannot spread glyphs across multiple fonts or (as of yet) access true 16-bit character sets. Therefore all the glyphs (including all contextual forms and ligatures) must number fewer than 224.

Status

The programming for the rendering engine is already pretty much complete.

Shoebox 3.05 beta includes basic screen support for the engine, and in West Eurasia Group we have MTTs using it already. They have been able to enter data much more accurately using this system than when they try to use a Romanization of their language. I have already developed demos of the system using Arabic-like, Hebrew, Syriac, modern Assyrian, Hindi-like, and Cherokee scripts, and can send them on request.

The next version of LinguaLinks should also have support for the SDF system via the Transduced Fonts mechanism.

Using CC for Non-Roman Text Rendering in LinguaLinks

by Peter Constable

A recent addition to LinguaLinks (LL) has been the ability to use the Consistent Changes program (CC) for complex text rendering. The idea is to have data stored in one encoding, but then to pass each string through a CC table before it gets rendered on the screen or printer. The use of CC for this purpose will be included in version 1.5, though it may not be fully documented in that version. NRSI has been working with the LL staff to assist in evaluating issues relating to implementation of this use of CC in LL. In this regard, I have recently been experimenting with this capability in LL for rendering Thai text.

Thai script, like other Indic scripts, presents issues of contextualization and ordering of characters; these create problems for rendering on the one hand, and analysis (e.g. sorting) on the other. In experimenting with CC-enabled rendering in LL, I was able to find ways to address some of these concerns. I was generally pleased with the results, though I did encounter certain limitations and problems in implementation. Here is a summary of pros and cons:

Pros:

  • Using CC for rendering text permits altering an encoding so that data can be stored in a more useful way. For example, I was able to encode Thai data so that it can be sorted correctly and also be rendered as it should be. There is also room to experiment with other encodings that permit additional benefits (e.g. being able to view data either in Thai script or in roman transliteration).
  • This implementation of CC provides enough power to handle some essential non-roman rendering issues: contextualization, glyph reordering, glyph insertion, and ligatures.
  • Performance appeared to be quite good: I was not able to detect any change in performance, albeit I have been working on a 200 MHz PentiumPro.
  • Setting up a CC table for rendering was not all that difficult (though at present it does require some exposure to Cellar programming).

Cons:

The following are important limitations in this system:

  • One limitation in rendering is that this system does not provide any solution for scripts that require more than 224 glyphs. (This limitation is imposed by Windows.)
  • Since LL has no knowledge about the relationship between the encoded text characters and the glyphs displayed, it has no way to know for certain how to handle the cursor. At present, there is a workaround, though it comes short of “correct” cursor handling. As a result, it is somewhat awkward to edit text while it is being displayed using CC-enabled rendering, particularly for scripts that involve reordering.

The following are minor problems I encountered in the current implementation in LL:

  • If a CC table is to be used for several fonts, then it must be stored in a separate disk file; storing the table in the LL knowledge base file would require creating a separate copy of the table for each font. It would be preferable to be able to store a single copy of the table inside the LL knowledge base file which could be used with as many fonts as desired.
  • At present, there is no easy way to change the font used for displaying text. This is a particular problem since one will not want to do any extensive editing using a CC-enabled font due to problems with cursor placement. It would also be desire able to be able to change fonts easily if different font/CC table combinations are used to give different representations of the same data.
  • At present, LL has no category for characters that delimit word breaks but which do not appear as visible space between words. The closest substitute would be to designate a word break character to be a white space character, but this will not always be handled by the system as desired.

This use of CC is now one of three devices available in LL for handling text rendering issues. For further details on our evaluation of this system, contact , and request the document “Transduced Fonts in LinguaLinks using CC”.

Apple’s NeXT OS

by Victor Gaultney

Late last year,  Apple Computer announced that they were acquiring NeXTSoftware, Inc. and were going to use the NeXTstep/OPENSTEP environment as the foundation for future versions of the Mac OS. In the weeks since then Apple has outlined their plans for this new OS, the future of System7.x, and key international technologies. Their hope is to produce an OS that can truly compete with Windows NT - within one year.

Rhapsody

The new OS - “Rhapsody” - will appear first in a developer release in mid-1997 that will be quite similar to the existing NeXTstep/OPENSTEP system. Based on a new version of the Mach microkernel (along with its Unix ties), Rhapsody will hold on to the best features of the NeXT OS -protected memory, preemptive multitasking, high performance, and add system-wide support for symmetric multiprocessing. It will, however, require major updates to current applications and may not include support for some current Mac technologies. Rhapsody’s “Premier” release for end users is scheduled for January 1998.

Although the current NeXTstep OS runs on many hardware platforms (including Intel), the main target for Rhapsody is PowerPC-based machines. Apple has not ruled out delivery on Intel platforms, but states clearly that the PowerPC is the priority. Even if an Intel version were to become available, it is highly unlikely that it would run Windows applications.

Tempo

In addition, Apple plans to continue to upgrade the current MacOS 7.x with substantive additions every six months through 1998. In July, “Tempo” will include a revamped Finder (the biggest change in six years), OS-level Java support and a new version of GX that removes the greatest roadblock for some users - the GX printing architecture - while preserving the graphics and typography features.

Yellow and Blue Boxes

In mid-1998, Rhapsody and System 7.x will be unified in a single system that can run both old and new applications. Rhapsody “Yellow Box” applications will run as before, with all the benefits of memory protection, etc. System 7.x apps will run in a “Blue Box” - a fully hosted implementation of System 7.x within one Rhapsody memory space.

This means that 7.x applications, control panels and system extensions will run unmodified in the new OS. Apple has already shown applications such as Photoshop and Word 6 running within an early version of the “BlueBox” - at speeds comparable to current mid-range systems.

Although the current NeXTstep OS runs on many hardware platforms (including Intel), the main target for Rhapsody is PowerPC-based machines. Apple has said that delivery on Intel platforms is likely, but states clearly that the PowerPC is the priority. Such delivery would be two pronged: you would be able to run the whole Rhapsody OS (minus the Blue Box) on an Intel chip, or you could run individual Rhapsody applications on Windows NT. Neither case allows for running current Mac applications on Intel hardware, though.

WorldScript

Apple is very sensitive to their international market, so they have already stated that WorldScript (in some form) will be one of the few current technologies moved into Rhapsody. The NeXT OS is already based on Unicode but does not have a sophisticated rendering engine or locale resources, items that could be migrated from WorldScript. It is likely, however, that the format of those resources may change.

GX

Rhapsody, like NeXTstep, will use Adobe’s Display PostScript (or a descendant) as the graphics model. QuickDraw GX will be dropped even though it is acknowledged to be a more powerful and sophisticated environment. Key features of GX, however, will migrate to Rhapsody. GX line layout and ColorSync color management will be added in quickly (to make up for DPS’s weaknesses), and Apple hopes to graft on other features (such as translucency) at some future time.

Impact on SIL

It is still unclear what impact this change of direction will have for our non-roman work. What is clear is that solutions developed for the current Mac OS 7.x will continue to be supported by Apple into 1999, and that Apple continues to take its international community very seriously.

The bad news here is that Rhapsody may require us to revise our solutions and tools to support new formats for fonts and international resources. It also may mean that few, if any, new GX-based applications will be written and the current ones may face only incremental updates (although Apple’s move to eliminate the GX printing architecture is targeted to encouraging more GX development).

The good news is that there seems to be no change in Apple’s global philosophy: the user should be able to use their preferred language, writing system and cultural preferences (date/time/etc.) anywhere in the system and throughout all applications. This means that we are likely to continue to see OS-wide support for multilingual computing, with enough flexibility to support minority languages and scripts.

In addition, the most important part of GX (for our purposes) - GX line layout - will be grafted onto Adobe’s PostScript environment. This link with Adobe will likely mean a greater competitive edge in the graphic arena and mean that international technologies (like WS and GX line layout) may gather more support in mainstream applications.

So at this point, the situation looks promising, but the real test will be to see whether Apple can deliver on their intentions quickly enough to hold onto their market.

See also: The Future of the Mac OS, Victor Gaultney, NRSI Update 3 November, 1996.

Non-Roman Optical Character Recognition

by Dennis Drescher

Optical Character Recognition (OCR) is the process of scanning a printed document and then, through software, turning it into a computer document that can then be reproduced or manipulated. In the past, IPub has done some research in this area. The re-keying of text for producing reprints and revisions of long, multi-section documents published before the current generation of computer technology is a very labor-intensive task. It was hoped that OCR might offer some relief. However, the error rate was too high and the time saved over re-keying would be expended in editing.

This past fall I was asked to look into the feasibility of non-roman OCR. If it was feasible, the benefits of having NR OCR would be great for those who do linguistic work and other tasks involving non-roman scripts. The collection and analysis of that data would be greatly aided by NR OCR. There is much potential.

Because this research was initiated by someone working in a particular field we decided to use the script from that field for our research. The script was Devanagari, one of the more difficult scripts because of the headstroke line that joins characters together to form word units.

In the course of the initial hardware and software setup, I needed to contact the publisher of the software I was using — OmniPage Pro by  Caere Corporation. As I spoke to the technical services representative and tried to explain the project I was involved in, I knew I was in trouble: the person on the other end of the line kept referring to non-roman scripts as “irregular” characters.

This world view seemed to be the basis for this software package. OmniPage Pro doesn’t seem capable of NR OCR. There is a training routine that you can initiate but if the material is not roman-based it yields bizarre results. It would seem, from my point of view, that the current architecture for digesting the graphical image and rendering the encoded text is based on the assumption that the text consists of distinct roman-based glyphs.

For NR OCR to be successful, the OCR software cannot be script specific. It must be capable of being trained at the individual glyph level. Another short fall I noted was that current OCR technology only supports a one-to-one glyph to code-point association. This is unworkable for many NR scripts that involve composite glyphs or large orthographies.

I am aware of an Arabic version of OmniPage Pro and this may be of interest to people using that script. However, as far as I know it is Arabic-specific and cannot be modified to work with any other scripts. At this point it would appear that NR OCR using commercial U.S. software is, for the most part, not yet feasible. However, I realize that my experience is limited. Maybe someone in the NRSI Update readership might have had some experience in this area. If so, would you be willing to share it with me? Are you aware of any software that can do NR OCR? Please write me and tell me about it.

Circulation & Distribution Information

The purpose of this periodic e-mailing is to keep you in the picture about current NRSI research, development and application activities.


© 2003-2018 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.