Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: General
Short URL: http://scripts.sil.org/NRSIUpdate14

NRSI Update #14 – October 2000

Lorna A. Priest (ed.), 2000-10-01

In this issues:

Introduction to TECkit—An encoding conversion toolkit

by Jonathan Kew

Why TECkit?

Unicode is being adopted as the character encoding standard for an increasing amount of software, both in the commercial world and within SIL. However, SIL has a lot of existing data encoded according to a variety of other “legacy” encodings. Some of these are “standard,” defined by international bodies or major companies; examples include ASCII, various ISO standards, Windows codepages, Mac OS Script Manager encodings, etc. Others, however, are “in-house” encodings created for a particular SIL entity, language team, or project where “special characters” were needed and a custom font was created.

For data in standard encodings, we can reasonably expect that operating systems and/or applications, as they move to Unicode, will provide the ability to convert existing data into the new encoding. In many cases, this happens automatically as part of the process of interpreting an old file format by a new version of an application; the user may not even be aware that an encoding conversion is happening.

For custom encodings, the situation is different: we cannot expect that software will know how to map between our private encodings and Unicode unless we somehow describe the mapping required. In many cases, we do not even have an explicit description of the encoding; it exists only in the sense that someone picked certain locations in a font for the “special characters” required. And to complicate matters, much of our data is not tagged in any way to indicate its encoding. In the case of formatted (styled) documents, the font used for a text run may imply a certain encoding, but that encoding may only have an implicit “existence” by virtue of the arrangement of glyphs in a font. And some of our data is in plain text files, where there is even less opportunity to reliably identify the encoding(s) used.

In order to successfully use legacy data in the new world of Unicode-based software, then, several things are needed: explicit descriptions of legacy encodings and their relationship to Unicode, where non-standard encodings have been used; configurable conversion software that can use such descriptions to map between the old encodings and Unicode; and identification of the encoding of each piece of data. Sometimes this may be at the level of a complete file; in other cases, a single file includes data in many different encodings (for example, in different fields of a Shoebox database).

TECkit represents a first step towards solving encoding conversion issues. As its name suggests, it is not a single piece of software, but rather a “toolkit” providing a collection of tools that can assist in text encoding conversion tasks. In some cases, TECkit tools may be used as-is; in other cases, they may be used as building blocks for higher-level processes.

The primary “tools” offered by TECkit are:

  • a portable binary file format for describing the mapping from a legacy encoding to Unicode and vice versa; and
  • conversion functions that can use such mapping files to convert data between legacy encodings and Unicode (in either direction).

These are the essential building blocks that can be seen as a foundation for complete encoding conversion solutions. In addition, TECkit includes:

  • tools (“compilers”) to build mapping files from human-readable descriptions;
  • utilities for both Windows and Mac OS that allow users to apply conversions to plain text files.

TECkit does not address issues of document markup or files that include data in multiple encodings; it addresses only the basic task of converting a chunk of data from one encoding to another. However, the conversion functions provided could be used by a higher-level tool that understands the structure of a particular kind of document (such as a Shoebox database) and applies different mappings to different portions of the data.

Software tools

TECkit includes two levels of software tools: “low-level” conversion libraries that actually apply mappings to chunks of data, and end-user utilities. The conversion libraries have no user interface and cannot be used directly; they provide a data conversion service that can be called by “client” applications. The end-user utilities are simple examples of such clients; they provide an interface allowing the user to specify input and output files and the encoding conversion to be performed, and then call the conversion libraries to actually process the data.

Windows

For Windows, the conversion library is provided as a dynamic link library, SILtec.dll. This includes functions to convert between legacy encodings and Unicode, and also to convert between various representations of Unicode (UTF-8, UTF-16) and between big- and little-endian formats.

There is also a simple Windows application, SILtec.exe, that can perform encoding conversions of entire files; this can be controlled through a graphical interface (for interactive use) or via command-line options (suitable for integration in batch files, for example). (See separate article.)

Mac OS

The Mac OS implementation of the conversion library is as a plug-in module for the Mac OS Text Encoding Converter (TEC), a component of the operating system. TEC “knows about” many standard encodings; the SIL Encoding Plugin provided as part of TECkit allows this knowledge to be extended to user-defined encodings via TECkit mapping tables.

There is a simple Mac OS application, Q-TEC, that allows users to select any encodings supported by TEC (including those added via the TECkit plugin) and perform conversions among them. Q-TEC supports an AppleScript interface in addition to the interactive GUI; this can be used to set up "batch" conversions as part of a larger workflow.

Encoding descriptions

At the heart of TECkit is its encoding mapping file format (“TECkit binary format”). This is a file format that has been designed to support a wide range of mapping needs, while being reasonably easy and efficient to implement in a conversion tool. The TECkit format is defined in a document available from the NRSI, but users do not need any knowledge of the internal format in order to use the tools.

The same binary format is used by the conversion engines on both Windows and Mac OS, so the mapping files may be freely shared between machines (if the same legacy encoding is in use on both platforms).

Binary mapping files are created by “compiling” a text description of the mapping. Two forms of this description are planned. One is to be an XML-based format, based on a currently-evolving Unicode standard for mapping descriptions. As the XML-based format can be rather cumbersome to work with, there is a second syntax based on simpler mapping rules, somewhat reminiscent of CC, Keyman, SILKey and similar systems. Either syntax may be used to describe any given mapping, according to the preference of the mapping author; the resulting binary files are equivalent and interchangeable.

The current compilers are implemented as Perl scripts; this means that it is necessary to have Perl installed in order to build TECkit mapping files. If this proves to be a serious obstacle, stand-alone compilers could be developed, but it is unclear whether this will be necessary. (Note that Perl is not required in order to use the resulting mapping files. It is expected that in many cases mapping files, just like fonts, keyboards, etc., will be developed by specialists on behalf of end users.)

We expect to release this product by CTC.

SILtec GUI

by Ryan Eberly

Hello, my name is Ryan Eberly, and I am from Lancaster County, PA. I attend Houghton College where I am double majoring in Computer Science and Mathematics. Linguistics and SIL’s work have interested me since high school so I was excited when I had the opportunity to come and work with the NRSI this summer. Jonathan Kew has developed a conversion tool for converting text into various encodings (see Jonathan Kew’s article), and it became my assignment to work on developing a Graphical User Interface for that tool.

I’ve had a wonderful time here, the folks here at Dallas are friendly and willing to help. Peter Constable was particularly gracious in working with me as I struggled to learn about Unicode, PUA, BOM, and WATAM (What All The Acronyms Mean). Thanks, Peter!

There were times of stress, too; four weeks into the summer, I was beginning to feel like all I had done so far was talk to people and learn about Unicode. I wanted to be producing something. But as work on the GUI began, that soon began to change. I came to Dallas knowing C++ fairly well – Java being my language of choice – but I was completely unfamiliar with programming in Windows. I think I can say that this is no longer the case ;-). Another interesting point came three days before leaving to go back home – we discovered that a significant portion of the conversion process did not function correctly like I thought it did. Thankfully, that issue has been resolved, and the GUI seems to function well now.

This summer here in Dallas has been a great growing time for me. I think my experiences this summer will help to further direct me in my life. I’m not sure what the future may hold, maybe a position in SIL’s LSWD? Regardless, I’m thankful for the experiences I’ve had this summer.

Problems with language identification

by Peter Constable

In linguistic software, it is important for us to be able to label data to indicate the language of the data. This is needed so that appropriate resources can be used (e.g. the correct keyboard and font), so that appropriate processing can be performed (e.g. so that the correct morphological parsing and lexical database are used when interlinearizing text), and simply just to document—to keep track of what data is in what language.

In the past, we have done this in Standard Format Marker (e.g. Shoebox) documents or in LinguaLinks databases using SIL Ethnologue three-letter codes. We are on the verge of transitioning to new technologies, however, using new standards. One of these in particular is XML. Within XML, language identifiers must conform to a certain Internet standard. The current version of this standard is defined in  RFC 1766.

This standard specifies a syntax for string identifiers, and also specifies a particular list of identifiers that are to be used. It does allow for the creation of “private-use” identifiers, which begin with “x-”. Thus, we can always be assured of the ability to create identifiers built from Ethnologue codes, for instance “x-nlg” or “x-sil-nlg” for the Gela (Nggela) language of the Solomon Islands. Increasingly, however, we need to work with others outside of SIL, and there is increasing interest in providing electronic archives of linguistic data online, as part of our academic contribution to the field of linguistics. For these situations, privately-defined identifiers are not really satisfactory. We need to use identifiers that are part of a common standard.

This presents a serious problem for us, however: the existing list of identifiers that are specified for use by RFC 1766 are mostly drawn from an international standard,  ISO 639 parts 1 and 2. Although this is an international standard (or, more correctly, because of this), this standard currently provides coverage for about 400 languages only, far short of what we need for work among linguistic minorities.

In an effort to address these concerns, Gary Simons and I have been working to create interest within the information technology industry in getting these standards extended to provide comprehensive coverage for the world’s languages. To this end, we co-authored a paper that was presented at the recent Unicode conference (see Unicode conference article). The slides from the presentation will soon be available on the  Unicode web site (look for session C3). Our paper is already available from the  SIL web site, and will be made available on the Resource Collection 2000 CD-ROM distributed at the JAARS Computer Technical Conference this November.

There are several remaining challenges to getting a comprehensive list of language identifiers adopted. We were pleased to see considerable interest from several people in addressing this problem, and this support has been encouraging. There are still several problems to overcome, however. There is not yet a consensus on the value of the Ethnologue as a source of categories to include in a standard, nor is there consensus on the particular approach to solving the overall problems that Gary and I proposed. Some people in industry have the mindset that additions to a standard list of language identifiers can only be made in certain ways - ways that we contend are not really suited to languages, and that favour languages of major commercial interest. It is not yet clear to us how to get beyond such obstacles.

One thing is clearly needed, however: to establish a mapping defining the relationships between the categories in the Ethnologue and those in ISO 639-x. We have begun to work on this, and have received an encouraging response from an ISO representative on doing this in collaboration. For now, what we can do is to work with contacts such as this who are open to working with us, and with those who share our vision for a comprehensive list of language identifiers.

WM_UNICHAR: A new Windows System Message

by Peter Constable

You may be aware of the fact that Microsoft Windows NT and Windows 2000 have more complete support for Unicode than do Windows 95/98/Me. One key area of difference has to do with keyboard input. In a nutshell, some applications, including any application running on Windows 95/98/Me, are able to accept characters from only certain ranges of Unicode. This limitation is defined in terms of Windows code pages: these applications are limited to only characters that are in some Windows code page that is supplied with the OS. (Some applications are specifically limited to only those characters in the default, system code page.)

This source of the limitation has to do with the semantics of a certain Windows system message: WM_CHAR. This is the message that is used within Windows to send individual characters to a program. Under the right conditions, the message can contain any Unicode character, but in the situations described above, it is only capable of sending an 8-bit character. Unfortunately, on Windows 95/98/Me, there is simply no way to get around that limitation. If it were not for that obstacle, we could be using Word 2000 on Windows 98, for instance, to enter data in any Unicode character range.

This spring, we got to thinking about how that obstacle could be overcome. The challenge was this: how to send a character that was guaranteed always to be Unicode, regardless of factors such as whether you’re running on NT or Windows 9x; how to provide this kind of capability without requiring any changes to the operating system (important so that it would be available on existing systems); and how to implement this so that it would work in applications that were built to support it, but not interfere with applications that didn’t.

It turned out that there is a very easy solution. This involved simply creating a new Windows message, WM_UNICHAR, that was guaranteed to always send a Unicode character. There are some additional implementation details that are needed to specify exactly how it works, but it is essentially that simple.

During the Unicode conference in Amsterdam, Bob Hallissy and I were able to present this proposal to some of our contacts from Microsoft. One of them got particularly excited: “You don’t realize how much we need this in NT.” I had only been thinking of it in terms of a solution for problems on Windows 9x, but the need extended to Windows NT4 and Windows 2000 as well. They took this idea back to Redmond, and it has since been officially adopted as a part of Windows.

As mentioned, this interface doesn’t require any particular changes to the operating system. It does however, require relatively minor changes to any applications or keyboard handlers that want to support it. Some MS application development teams intend to support it in their next versions; I don’t yet know exactly which ones, though. It will very likely be supported in the first FieldWorks applications, however, and it is already supported in the beta versions of Keyman 5.0.

2000 Apple WorldWide Developer Conference

by Jonathan Kew

Last year at Apple’s WorldWide Developer Conference, Mac OS X was a promise for the future. This year, it is a reality. Mac OS X was clearly the major theme of the conference, with Apple encouraging developers to adopt the new platform. Just as Victor reported last year, Apple is still on track—the same track! As such, WWDC 2000 was not so much a presentation of new plans as an update on progress along the already-known track, and an encouragement to move forward with Apple; the key message of this year’s conference was that it is time now to be updating products for OSX, as the system is here for real.

TUG2000—August 13-18, 2000

by Jonathan Kew and Lorna Priest

The  TeX Users Group meets once a year and this year the conference was held in Oxford, England. It was great to meet many of the people who develop TeX/LaTeX packages. Hopefully contacts made will continue to be of benefit to us. Jonathan hadn’t been to a TUG conference in a number of years and so was interested in hearing about many of the latest developments. Lorna has only been using TeX/Omega for a year and so it was helpful for her to get a more well-rounded picture of TeX and its extensions and how they all work together.

Multilingual

Interest in Unicode is definitely growing in the TeX community, just like the rest of the industry. The math & science people, who are major TeX users, are seeing the need to get their specialized symbols included in the standard.

 LaTeX has a long tradition of multilingual support within the world of 8-bit encodings and simple left-to-right scripts (primarily via the Babel package). This continues to be well supported and developed, but is of interest primarily for Latin and Cyrillic scripts.

Currently, the major route to Unicode support in the TeX world is  Omega. This gives (in theory, at least) complete Unicode support, and should work with any TeX macro package. The big hurdle at this point is font support, including the fact that complex scripts require the behavior to be expressed via Omega’s processes; there’s no linkage to OS-level features such as OpenType. This means Omega is OS-system-independent, but on the other hand it means that script engineering would have to be re-done for Omega. And TrueType fonts are not well supported (this is primarily a function of the output device drivers and the associated font tools, not Omega itself).

Omega is still somewhat in an experimental stage, but is a solid piece of work that has a promising future. With the increasing interest in Unicode (including pressure from XML) and multilingual issues, the TeX world may be just about ready to embrace Omega.

LaTeX

The LaTeX 3 project looks as though it is coming close to having a system for configuring document styles, handling page layout (including multiple columns, floating figures, etc) that could be really useful. Although it is not really ready for use, we will try to keep an eye on this as it has the potential to be very useful to us in the next year or so. There are many add-on packages out there which could meet our needs for different types of publishing; an example we saw was the Amsrefs (LaTeX) package which has potential to help us in publishing dictionaries. We will still need people with the ability to write TeX packages “from scratch” for non-standard document types, however.

XML

The ability to read XML is another definite growth area, as you would expect. xmltex, a macro package that implements a full XML parser with hooks for you to attach formatting code to whatever elements you like, looks very impressive. The main limitation at this point is Unicode support; it interprets UTF-8, but font/writing system support is limited to what the underlying TeX system has. However, the developer intends to move to Omega as a base in order to get full Unicode support. (xmltex is written entirely in TeX macros, so should run on top of any TeX-compatible system, including TeXGX or ATSUI-TeX, when available.)

As far as page layout is concerned, an xmltex-based system is dependent on the underlying TeX engine; this can be LaTeX, or some other custom package. Given that the page layout capabilities of LaTeX are also improving, there’s real potential here.

There are a couple of ways this could be used to typeset an XML version of a dictionary or any other data. One could attach formatting code directly to the various element types, resulting in a custom setup for that particular document type. Alternatively, use  PassiveTeX, a package derived from and built on xmltex that handles XSL Formatting Objects; an XSLT (Extensible Stylesheet Language Transformations) processor could (given the proper stylesheet) transform the original XML document into serialized Formatting Objects, which can then be typeset using PassiveTeX’s LaTeX-based setup.

TeX extensions

There are several “extended TeX” systems around at the moment; pdfTeX, Omega, e-TeX, VTeX, etc. Probably the most interesting to us are  pdfTeX (because general workflow and in particular font management can be simplified, compared to other systems) and Omega (because it is a truly 16-bit—soon to be 31-bit—system with extensive support for “interesting” writing systems). In principle, it sounds attractive to consider bolting pdfTeX’s back-end onto Omega; I don’t know if this might really be feasible, but we tried to encourage one of the Omega developers in that direction.

In the longer term, the New Typesetting System ( NTS) project has tremendous potential. NTS is a complete re-implementation of TeX in Java. Since TeX was declared “frozen” by its creator, Donald Knuth, the NTS project was started so as to be free of the constraints Knuth had placed on any further development of TeX. Everyone who has worked on extending TeX recognizes that the program architecture, designed to work on what was reasonable hardware 20 years ago, is really hard to make major changes to. NTS sounds like it is well designed for flexibility and extensibility; the issue now is whether they can bring it up to an adequate level of performance (the current Java version is up to 100 times slower than Pascal/C TeX!) or possibly port it over to C++ or something else.

Fonts

Almost everyone who discussed any font related issues were talking about Metafont or Type 1 fonts. This was an area of disappointment as it would have been nice to hear more on TrueType usage as this is the format for most of our SIL fonts.

Miscellaneous

The  AsTeX Assistant and Navigator is a utility for navigating DVI/PDF/PS previews of typeset TeX documents, based on the auxiliary files (index, table of contents, etc) generated by LaTeX. Windows-based, interacting with various viewers for the various file formats. Very slick; and even if we're not using LaTeX, another macro package could write out suitable info. So the preview of a large document could be navigated in terms of a chapter/section outline, for example. I wish I had something like this in TeXGX! All of the documentation for this is in French which makes it somewhat difficult to get started if you are not a French speaker.

SeyboldSF—August, 2000

by Dennis Drescher

I attended  SeyboldSF this year for the first time. There were a number of others from International Publishing who attended along with me. I had several goals in mind before starting. They can be summed up as follows:

  • Find out how publishing applications are progressing in the area of Unicode and XML, especially in NR issues.
  • Look for a potential replacement for Ventura.
  • Pick up some resource material on XSL that goes into some depth on stylesheet writing.
  • Learn as much about writing XSL stylesheets as possible.

The first three goals were to be accomplished by attending the exposition, the last goal by attending the Publishing special interest day and tutorials on Thursday and Friday.

Some progress has been made in XML and Unicode in high-end publishing apps in this past year. However, all except one fall far short in providing a comprehensive solution to publishing in NR writing systems. Very few are implementing Unicode and there was only one there that I saw that provided round trip XML support. “Round trip” means that XML is native, not just an add-on feature. It has the ability to read and write in XML and supports validation against an XML schema. Any app that can’t do this is hardly worth looking at. There were many that fit that category.

The only application at this time that I saw as a potential solution to at least some of our publishing needs was an application called  3B2 (named after the building it was developed in). When I showed them samples of the NR publishing we need to do, they indicated that their product would be up to the challenge. This may or may not be true, but they at least understood the problems better than most there.

The problem with 3B2 is that it could take $100,000 to deploy. That might make it more worthwhile to continue with TeX development and use that as our NR publishing tool.

As for a replacement for Ventura, those of us at Seybold from IPub and NRSI all thought that  FrameMaker by Adobe would be most suited for that task. It was the closest thing there that fit our publishing model. Plus, Adobe seems to be committed to implementing Unicode and XML in FrameMaker which will at least meet our future Roman needs.

I wasn’t able to find any useful resource material on XSL there. It seems publishers are hesitant to produce anything on that subject because of its rapid obsolescence. It is still a developing standard. Maybe if they published in a newspaper format, we wouldn’t have to feel so bad about throwing it away in the near future. :-)

Learning about XSL was both a joy and frustration. Wednesday’s and Thursday’s sessions were mostly overview and review for me. Friday, on the other hand, was great. I was signed up for the morning tutorial entitled “Capturing and Managing Information Assets using XML.” Some of this was review too, but the teaching style made it very interesting and he had the ability to work with a broad range of people and give them all something. Since there was nothing planned that afternoon, and I enjoyed his morning session so much, I asked if I might be able to attend the afternoon session as well. We were able to make the arrangements and I found that session to be the best of all. I even found some immediate solutions to some XSL implementation problems I was having.

Points of interest for the week:

  • Quote by Friday’s instructor: “XML is not a markup language, it is a syntax used to describe other markup languages.” (That was a profound statement to me.)
  • I obtained a trial copy of Epic by  ArborText. This is an XML authoring/editing application.
  • I found a company called Acutrack that does silk-screening of CD-Rs in small quantity, which, up until this past week I was told was not possible.
  • One of the more interesting booths was  HV Ltd. They provide a plugin for Word 2000 that allows users to export valid XML. Since it’s being done in Word, it makes it easier for users to get up and running. It has a lot of potential; however, the big drawback is that they want $850 per seat for it, with a 10 seat minimum purchase. I have an evaluation copy to test.

These were the items that interested me the most. There was much more but SeyboldSF is a huge conference and Expo. There is so much to see and learn, and I think it was worth going to. I think that, in the future, whoever goes to Seybold from the NRSI should probably spend more time with the vendors. I feel that I should have spent more time showing them what our needs are and tried to make more industry contacts. Some companies want to position their product in the international market. Showing them where they might be falling short could be beneficial for both us and them.

16th/17th International Unicode Conferences

by Peter Constable

The 16th International Unicode Conference was held in Amsterdam, Netherlands March 27–30, 2000, and the 17th International Unicode Conference was held in San José, California September 5–8, 2000. As usual, each time there were two days of tutorial sessions followed by two days of conference, and as usual there were both plenary and parallel sessions (three separate sessions going concurrently most of the time).

The spring conference is usually smaller, and that was true of the Amsterdam conference. Understandably, the European venue did bring a different audience than the North American conferences do, which was both interesting and useful (I got to meet some people that I have interacted with by e-mail that I otherwise would not have). There were five of us from SIL in attendance: Sharon Correll (Language Software Development—hereafter, LSWD; and NRSI), Bob Hallissy (NRSI), Sue Newland (NEG), Lorna Priest (NRSI), and me.

This fall’s conference was the largest yet: there were about 450 participants, which represents a substantial growth since the San José conference a year ago. There were five of us from SIL that attended: Sharon Correll (LSWD/NRSI), Martin Hosken (NRSI/MSEAG), Greg Lyons (MSEAG), Ken Zook (LSWD), and me. Joining us was Marc Durdin, our partner from Tavultesoft and the author of Keyman.

I have gone to the past four conferences, held in the spring and fall, and have established a number of relationships with many of the conference regulars. It was useful to have a group of others from SIL at the past two conferences starting to get to know people, and giving more visibility to SIL.

Attending the Unicode conferences has had potential for benefits in several ways:

  • Providing an opportunity for key SIL people to learn about Unicode and related technologies for software internationalization.
  • Keeping abreast of developments within industry that may have potential for some of our multilingual and multiscript computing needs.
  • Building relationships with people in industry that could lead to beneficial interactions.
  • PR for SIL, and for technologies developed within SIL (e.g. Graphite) or by close SIL partners (e.g. Keyman).
  • Interacting with people from industry regarding specific projects of potential mutual interest.

We have continued to realize benefits in each of these areas. I’ll comment on some of these areas.

With regard to education, Sue Newland, Lorna Priest and Greg Lyons all felt particular needs to learn, and all of them found attending the conference helpful in that regard. For example, in NEG, Sue faces some complex problems supporting users working with a variety of encodings using different applications on several operating systems. Sue mentioned afterward that it was especially helpful for her to gain a better understanding of the Unicode standard and of some of the issues related to character encoding. Those of us that came with more knowledge of Unicode also benefitted. As a developer, Ken Zook mentioned learning things that he needs to keep in mind as he writes his code. Even those of us whose jobs include being Unicode “gurus” had opportunities to learn more. For example, I found it helpful in Amsterdam to learn more about internationalization issues for the World Wide Web, such as how things like encodings and language identification are handled.

Relationship building has been of particular benefit, I think, in a strategic sense. For example, we have several valuable contacts at Microsoft, most of which came about through our interactions with the people at Unicode conferences. I didn’t meet as many new people at these last two conferences, but there were several people that I did meet with whom we might at some point have beneficial interactions. (For example, I met a number of people at the past two conferences who are interested in the work I have been doing lately regarding language identification, and would like to see the same kinds of changes in industry standards we would like to see.) We were also able to strengthen some existing relationships; for example, with our contacts at Microsoft.

With relationships in place, we have been able to pursue some specific opportunities to accomplish some things together with people in industry that may be of mutual interest. For example, at the Amsterdam conference, Bob Hallissy and I had some very useful interactions with people from Microsoft about ideas for overcoming limitations in Win9x/Me in terms of Unicode support. We had a specific idea for solving a key problem and were able to present it to them, and they have begun to implement that solution in their software. This could be very helpful for us in providing practical solutions for users in a number of field locations. (For details, see article on WM_UNICHAR.)

Another example of this kind of interaction was some of the interactions I was able to have with people in San José regarding the problems we face with regard to language identification. (For details, see article on Problems with language identification.) I was able to discuss some of the issues with some key people involved with industry standards, and to get their support for what we hope to do.

Another example worth mentioning was the interaction that Martin Hosken and I had with Mark Davis (IBM, active member of the Unicode Technical Committee) and others discussing encoding conversion. Mark has been drafting a Unicode Technical Report that specifies a standard way to describe encoding conversions, but the drafts lacked some key capabilities that would be needed for many of the custom encodings we have used in SIL (including that for IPA93). Martin was able to convince the others of the need to go beyond what they had been thinking, and a compromise was reached that will give us what we believe we need. At this point, it appears that Martin will become a co-author of that document.

At these past two conferences, Sharon Correll has given a presentation on the Graphite rendering system, including a demo. This made a very big impact with people in Amsterdam, and also made a significant impact in San José for those who saw it there for the first time. Sharon found it very rewarding to see someone like Donald Knuth (creator of TeX) get excited about Graphite. Graphite is very definitely providing good visibility for SIL, and is helping to create relationships. The challenge is deciding which ones would be most beneficial, in terms of our goals, for us to pursue. For example, there is considerable interest in getting Graphite into an open source development model and for it to be ported to Unix/Linux. We also had an invitation from a contact at Sun Microsystems to make it possible to bolt Graphite rendering into Java 2D. At this past conference, we had some discussion with Peter Lofting of Apple about potential interest in Graphite's GDL language for use in other systems.

Marc Durdin was also able to meet a number of people in San José, and to show the latest beta of Keyman. He had several people very impressed with his work, including some potential major customers (e.g. the US Government). This is good news for us, since the success of Keyman will lead to a better product.

Altogether, I feel our involvement at these two conferences was very successful in many regards. Now it’s time to start thinking about the next conference: IUC 18 will be held in Hong Kong, April 24–27, 2001. Already, a call for papers has been issued. We’ve given some initial thoughts to potential papers, and to whom from SIL it might be beneficial to have attend. If you have ideas for a paper, or if you think it would be particularly helpful for someone from your entity to attend and would like more information, please let me know.

Circulation & Distribution Information

The purpose of this periodic e-mailing is to keep you in the picture about current NRSI research, development and application activities.


© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.