Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


WSTech: Writing Systems Technology (formerly known as NRSI)

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: General
Short URL: https://scripts.sil.org/NRSIUpdate05

NRSI Update #5 – April/June 1997

NRSI staff, 1997-06-01

Welcome to issue #5 of the NRSI Update!

In this update:

Change in NRSI Management

by Margaret Swauger

For a number of years I have done two large and very different jobs: Manager of the NRSI, and wife and mother of three. In the last few months my family responsibilities have become heavy and difficult, requiring me to step down as NRSI manager. Peter Constable has taken my place and I will be assisting him in the areas of planning and resource development. All correspondence intended for the NRSI manager should be sent to Peter Constable.

To date, Peter Martin has distributed four issues of the NRSI Update electronically, at approximately two-month intervals (often held back in order to report on certain events). In the light of increasing web access and HTML usage, starting with Issue #3 these are distributed as HTML documents(with a text-only version for those who ask) to provide a compact format with some layout control, navigation among sections and issues, and links to relevant external sites.

Developing Non-Roman Rendering Capabilities for Windows (“WinRend”)

by Peter Constable

For many years, we have all awaited the development of technologies on the Windows platforms that would enable us to do multilingual work and, in particular, to work with non-roman language data. There was great anticipation that Microsoft’s TrueType Open (TTO) technology might be the answer, but it had become apparent by late summer 1996 that TTO was going to fall well short of what we need.

At the Non-Roman Technical Consultation in September 1996, after the disappointing facts regarding TTO had been digested by all, some participants began to discuss the possibility of SIL developing a rendering engine that could run as a layer on top of Windows and that would have the flexibility to work with any writing system. (For now, this software-to-be has been dubbed “WinRend”.) Some important problems were noted, as were some of the challenges that would have to be faced, but the benefits were readily apparent.

For various reasons, the NRSI/Dallas team had not felt that there was a clear mandate to pursue development of WinRend at that time. At the CTC in November, however, we got together with all of the representatives from entities for which non-roman scripts are a major concern to get their input on how the problem of poor non-roman support on Windows should be addressed. During this discussion, the suggestion of developing WinRend came up again. Among the representatives of field entities, there was concensus that we should pursue development of WinRend, even if it only works with SIL applications. (This, in fact, was the main consideration behind link#ctc a resolution passed by CTC} in relation to the work of NRSI.)

At the recent meeting of the NRSI Field Advisory Board, the NRSI/Dallas team was given a very clear mandate with regard to WinRend: the Board identified development of WinRend as the top priority for NRSI in the coming years. Accordingly, we are publicly announcing our intent to pursue development of WinRend.

So, what exactly is WinRend? We have not yet identified all of the functionality it should have, but we have some definite ideas: at least, it should

  • support both left-to-right and right-to-left scripts;
  • support contextualization, positioning, ligatures and reordering;
  • support large-character-set scripts (scripts that require between 224 and 2000 glyphs), and possibly also very-large-character-set scripts (those that require tens of thousands of glyphs);
  • support cursor placement and text selection; and
  • provide the above capabilities to any SIL Windows application that has been written to support it.

WinRend is intended to be a long-term solution to problems of rendering non-roman scripts on the Windows platforms. There are likely to be some limitations; for example, it is quite possible that WinRend would not provide sufficient capability for handling vertically-oriented text.

There are two existing partial solutions for rendering that are being used in versions of LinguaLinks and Shoebox that either have been or will soon be released: CC (used in LinguaLinks), and SDF (used by both LinguaLinks and Shoebox). CC is powerful in terms of its ability to handle non-roman behaviours, but it provides no means for applications to handle cursor placement and text selection (see “Using CC for Non-Roman Text Rendering in LinguaLinks”, NRSI Update #4).While SDF does provide ways to handle cursor placement and text selection, the author, Timm Erickson, has noted that SDF has certain limitations (see “Introducing the SDF-based Rendering Engine”, NRSI Update #4). WinRend is intended overcome such limitations.

Work on WinRend has not yet begun, but we have had some initial planning discussion. We have also interacted with the LinguaLinks and Shoebox developers and have assurance of their desire to cooperate with us in this development. Before any actual software development can begin, we will need to do a significant amount of research both in terms of requirements and also in terms of possibly technologies to utilize.

Please note that, at this time, we are not able to make any guarantees. We have enough confidence in the feasibility of this project to start work on it, but we don’t yet know what obstacles will present themselves. We hope to be able to publicly display a product at the next CTC, and preferably sooner.

Apple Directions

by Victor Gaultney

Over the past year, the whole future for Apple has changed dramatically. Jonathan Kew and I attended Apple’s WorldWide Developer Conference from May 13-16 to investigate their new strategies and encourage them to make multilingual and multi-script computing more accessible, especially to those using minority languages and scripts.

In general, Apple is continuing development of the current Mac OS for many years to come while preparing a new operating system, “Rhapsody”, for the long term future. Rhapsody is essentially the existing OpenStep OS from their buyout of NeXT, with a more Mac-like interface and additional functionality migrated from the Mac OS. The first developer release of Rhapsody is expected in July, with more complete versions due in January and July of 1988.

Three key aspects of Rhapsody make the OS very attractive:

  • First of all, Rhapsody is based on a tested, shipping OS that already runs on Intel platforms, meaning that a rapid release for the Mac community is feasible (in fact, a great deal of work on bringing a new Mac-like Rhapsody to the PowerPC has already been completed).
  • Secondly, applications written for Rhapsody will run on Rhapsody, Windows NT, Windows 95 and the current Mac OS.
  • Finally, Rhapsody is well poised for international software development- it is Unicode-based, already supports “smart” rendering and will directly support the QuickDraw GX smart font format. These multilingual capabilities will also work identically across platforms.

Most developers (including ourselves) came to the conference a bit skeptical about the radical new plans Apple had laid out. By the end, though, the mood was quite upbeat. Apple had managed to convince developers that supporting Apple’s platforms was a worthwhile proposition - that the current Mac OS was not dead, that a new, exciting, competitive OS was well along, and that they could write apps to the Rhapsody APIs and deliver them on all major platforms. It also seemed that while Apple’s financial troubles were quite real, a turnaround was already taking place.

We were quite encouraged to see Apple’s international focus maintained, and that our current GX fonts would be directly supported in the Rhapsody text system — the main text system used by most applications. We were very impressed and excited about Rhapsody - much more so than we expected to be. Without exaggeration, Rhapsody looks to provide the most robust, fully cross-platform development system for building powerful multilingual applications that is available. It has great potential for meeting our non-roman script needs in SIL.

Unicode Capability in Microsoft Word 97

by Peter Constable

You may have heard rumors that Microsoft Word 97 provides support for Unicode-encoded text. This is, indeed, the case. This latest version of Word uses a new file format (don’t attempt to read these files in earlier versions) in which all text is stored as 16-bit values and is encoded in Unicode encoding. I have done some testing of the US version of Word 97.What I describe here may not be entirely true of localized versions.

Unicode support in Word 97 means that Word is now improved in its ability to handle multilingual text. It does not mean, however, that it is able to handle non-roman behaviours such as ligatures or positioning.(Presumably, localized versions of Word 97 running on the corresponding localized version of Windows 95 will support whatever behaviors that version of Windows 95 is designed to support.) Also, Word 97 is designed to support only languages that are defined within Unicode.

A nice little extra is also included, at least in the Office 97 package: a selection of Chinese, Japanese and Korean fonts. Since these scripts do not involve behaviors such as contextualization, Word can render any of the thousands of characters, so that it is possible to view a CJK document that is encoded in Unicode. (Word 97 does appear to have some knowledge of double-byte encodings - see below - but I don’t know whether it can read a CJK document that is double-byte encoded.) No facility is provided, however, for entering CJK text. One still needs a Far East version of Windows95 for that.

With the multilingual extensions installed in Windows 95, it is possible to enter text in Word in a number of different languages that are supported. The interface is quirky, however: specifying a language, say Greek, by going to the Tools/Language/Set Language dialogue in Word does not mean that subsequent text that is entered will be in Greek. Apparently, all this does is to specify that Greek proofing tools (e.g. spell-checking) should be used for that text. To actually get Greek characters, you need to use the dialogue that is accessed from the icon tray on the Windows95 taskbar to change the keyboard. (This only appears if the Windows 95multilingual extensions are installed.)

For those who have been working with non-roman script data using KeyMan, this is still possible in Word 97. You must make sure that the Windows US English keyboard is selected (if you have multilingual extensions installed).Even though you may be seeing Thai or Devanagari on the screen, Word will think that it is working with the “ANSI US” (Latin 1) codepage(i.e. the default character set for US Windows 95). The text will be stored in 16-bit Unicode encoding, with each character falling in the Latin-1range of Unicode (and not the Thai or Devanagari range.)

Word 97 allows you to save a file into the native Word 97 format, or into several other formats, including Word 2.0, Word 6.0/95, RTF, HTML, MS-DOS text and Unicode text. (Note that if you save a file in Word 6.0/95format, what you actually get is an RTF file with a DOC file extension. By mid summer, there should be a free upgrade that will be able to save a file in true Word 6.0/95 format.) If you have non-English text in your document, saving in different file formats gives varying results:

Unicode text: This gives exactly what you expect: Formatting is stripped away leaving only the text, which is stored as 16-bit Unicode.

RTF or Word 6.0/95 (actually RTF) formats: Before a run of non-English, non-CJK text, the RTF file will have a tag indicating the alternate character set and then will store characters in the run as 8-bit, hexadecimal values. I don’t know how many different character sets are supported, or how these character sets compare with common character set standards. (For example, for Cyrillic text the tag stored is lang1049; the Windows codepage for Cyrillic is 1251, and the MS-DOS Cyrillic character set is 855.)

For a run of CJK text, things are a little more complex. It appears that the RTF file stores each character twice: first in double-byte encoding(represented as hexadecimal values) and then in 16-bit Unicode (represented as decimal values). Here is a sample of RTF code for a run of Chinese text:

f2fs20cgrid {fs40 lochaf37hichaf2dbchf37 
uc2u20224’c9’b1u20225’a5’f8u20226’c9’b5hichf2uc1u20227’3f}

In this example, the keyword f37 is referencing one of the Chinese fonts included with Office 97. Note that several new RTF keywords are used here:cgrid, loch, hich, dbch, uc, u.

Word 2.x: Non-English text may or may not be maintained: Text in a language supported by the multilingual extensions (e.g. Greek or Russian)is converted to 8-bit data that is encoded according to the Windows codepage for that language. This text will still appear as is originally did. Text in other languages, including CJK, is also converted into 8-bit data, but without a Windows codepage to support that language, the text will become illegible Roman characters.

MS-DOS text: Non-English text is not maintained: A few characters may be transliterated in a sensible fashion (e.g. in the test I did, Greek alpha and pi were converted to “ap”) but everything else is changed to a question mark.

code:

Αγαπη

Фываолдж

伀& amp;#20225;伂伃

code-end: indent:I tried to view an HTML file generated by Word that contained the text shown above in Netscape Navigator and in MS Internet Explorer. It is interesting to note that, in both cases, the text did {em:not} display correctly. It is likely that Unicode will continue to grow in importance for our work. For that reason, it is nice to see Unicode capability in Word. Also, it is nice that there is now a way to work with large character sets in Word. We still have some major obstacles to overcome for handling multilingual data on the Windows platforms, however: Rendering text in which contextualization, ligatures, etc. are involved is still a problem; while Word may be able to display Unicode-encoded text, we still need ability for entering such text beyond what the Windows 95 multilingual extension keyboards provide; we need ways to create fonts that contain very large collections of glyphs and that work with Unicode-encoded data; and we need to need to deal with the fact that most of the languages we work with, being minority languages, are not and may never be supported by Unicode. h1:Circulation & Distribution Information The purpose of this periodic e-mailing is to keep you in the picture about current NRSI research, development and application activities.

© 2003-2018 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Contact us here.