Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE

You are here: Encoding > Conversion > Utilities
Short URL: http://scripts.sil.org/EncCnvtrs_Obsolete

SILConverters — Obsolete version

Microsoft Word/COM support for TECkit, CC, and ICU

Bob Eaton, Mark Penny, 2005-04-21

(updated: 2005-04-21)

Obsolete version

Please note that this product has been replaced by SILConverters 4.0 and you are strongly encouraged to use that product. This page is retained for those who, for whatever reasons, are unable to use the new version and require the older, unsupported, version.

This package provides a system-wide repository for encoding converters and transliterators (TECkit,  CC, or  ICU based) and a simple COM interface to select and use a converter from the repository. It is easy to use from VBA or C++. An included VBA macro provides a simple interface to manage and use the repository, making it easy to convert any file (e.g. SFM texts, lexicons, and even Word documents) to a different encoding based on one or more TECkit maps and/or CC tables. The macro interface also provides the ability to install and remove user-developed converters to the repository.

Installation and use

Just extract the install set zip file on your hard disk (with the “Use folder names” switch selected) and run the Setup.exe program to install it. The package includes the TECkit, CC, and ICU runtime files so it does not require a separate installation of these packages.

The user interface is relatively simple to master:

Data Conversion Macro main window



Notice three distinct areas:

  1. Conversion table details This is where you select one of the Conversion Tables from a list; you can also add new tables to the list.
  2. Scope of change Here you can restrict the scope of the conversion (apply changes to the whole document, a selection, particular backslash markers, or a specific font).
  3. Target Data Finally, you can optionally reformat the converted data by specifying a style or font.

Further information is in the zip file available below, together with the utility itself and a set of data files that it requires.



Download

Encoding Converters full archive, version 2.35b
Bob Hallissy, 2004-05-24
Download "EncCnvtrs235b.zip", ZIP archive, 6MB [2456 downloads]

Frequently Asked Questions

Note

The Bulk Word document converter in the newest versions of SILConverters 4.0 resolve most of these issues. There is also an Office converter in the newest versions of SILConverters 4.0 which resolve other issues.

Question: When I try to convert my file using the SILConverters 4.0 package some of the characters are not being converted. If I select the text that was not converted correctly and run the Unicode Word Macro “Show Unicode” I see that Word thinks every one of these is U+0028  LEFT PARENTHESIS. What is happening?

Answer: Our SIL legacy fonts have traditionally been encoded as symbol fonts. People who want to convert their data to Unicode using the Microsoft Word/COM support for TECkit, CC, and ICU package have run into problems when they entered their data using Insert / Symbol. The characters which were inserted in this manner do not convert correctly. They are converted to U+0028  LEFT PARENTHESIS.

Note

Characters that were inserted into the Word document using Insert / Symbol are encoded in a way that the macro cannot identify the character.

The way to fix this is to follow these steps:

  • Close the file without saving changes.
  • IF you have  OpenOffice1:
    • Open your document with the OpenOffice Writer program. You may be able to right-click on the file, select Open With, and OpenOffice.org 2.0 from the context menu. Otherwise you need to start the OpenOffice program, navigate to the file and open it.
    • In OpenOffice Writer, select File / Save As. Change the “File name:” field to “[your document name]OO.doc”. This will keep your original file intact in case you run into problems.
    • Click the  Save  button and then close OpenOffice Writer. Now double-click on the newly created “[your document name]OO.doc” file to open it in Word.
  • If you do not have OpenOffice:
    • Open your document with the WordPad program. You may be able to right-click on the file, select Open With, and WordPad from the context menu. Otherwise you need to start the WordPad program, navigate to the file and open it.
    • In WordPad, select File / Save As. In the dialog box that comes up, make sure that the “Save as type:” field is set to “Rich Text Format (RTF)”. You must do this in WordPad, this step will not work in Word.
    • Change the “File name:” field to “[your document name].rtf”.
    • Click the  Save  button and then close WordPad. Now double-click on the newly created “[your document name].rtf” file to open it in Word.
  • From the Tools menu (or in Word 2007 Add-Ins), select the Data Conversion menu item.
  • For “Name:”, select “SIL IPA93 <> UNICODE”.
  • For “Scope of change”, select “A specific regular font:” and choose “SILDoulos IPA93” from the list.
  • For “Target Font”, check the “Apply specific Font” box and choose “Doulos SIL” from the list.
  • Check “Preserve character formatting” or you will lose in-line bold and italic formatting2. Click  OK .

Note

The exact sequence of commands may vary depending on your version of Word.

Bug:

The WordPad solution above does not work as well as the  OpenOffice solution. WordPad does not recognize autonumbering or footnotes and these will be lost in WordPad.

Question: I have a lot of autonumbering in my document and do not have OpenOffice installed and do not want to use the above WordPad solution. Is there some other way to convert my legacy data to Unicode?

Answer: You could still run the Microsoft Word/COM support for TECkit, CC, and ICU package macros on your data.

First though, you will want to convert all the characters you inserted into the text with Insert / Symbol to the proper legacy codepoint. Let's say that the schwa is not converting properly because it was inserted with Insert / Symbol. In order to do this you will need to copy that character into the Find field (you cannot just type it in) and in the  Replace  field you will want to either type in the correct Symbol encoding or go ahead and replace it with the proper Unicode character. In this case it would be U+0259 . This character will probably not show up properly until you format it with Doulos SIL.

Finally, you should run the Microsoft Word/COM support for TECkit, CC, and ICU package macro on your data.

Note

This solution will not work if you are running Win98 (or versions of earlier than Word XP), because you cannot successfully search for any IPA93 characters, unless you feel up to some number crunching. Here is that solution:

Take the decimal value of the character, add it to 61440 (= F000 hex), then use  Alt-key  typing (hold down  Alt  key,
and type  0  [zero] followed by the result of the addition on the number pad. So, for example, to search for a c in SILDoulos IPA93, I hold down  ALT  and type  061539  on the number pad).

Question: I noticed in conversion that the IPA in the footnotes did not convert. And when I tried to do the conversion piece-by-piece funnier things happened. I selected one SILIPA93 character in a footnote, tried the Data Conversion macro, and the font fields came up blank. Any ideas what else I can do?

Answer: The most likely solution is to make sure that you check the “Include footnotes” checkbox under “Scope of Change” in the Data Conversion dialog box.

Otherwise, it may be that you have inserted them as endnotes. If you use the Word command to convert endnotes to footnotes, the macro will convert SILDoulos IPA93 text to Unicode. (But it won't convert the characters you inserted using Insert / Symbol). Then you can run the Word command to convert the footnotes back to endnotes.

This Word command is Insert / Reference / Footnotes then click  Convert . This will convert your endnotes to footnotes if you have endnotes, and footnotes to endnotes if you have footnotes.

Question: How do I use SILConverters to convert a Publisher document to Unicode?

Answer: The easiest way is to open your legacy Publisher document. Go to the first text box containing data you wish to convert. Click Edit / Edit Story in MS Word. Wait a minute for it to copy your data and open Word. Do your Data Conversion as you would normally do. Then click the small x to close the document. It will then take the converted data back to your Publisher document and display it there.

Repeat for each text box.

Question: I have Standard Format Marker (sfm) text files and want to convert only one or two of the sf markers from an SIL IPA93 encoding to Unicode. Can I do that?

Answer: SFConv is a command-line utility which can convert sfm files to and from Unicode. Various sfms can be converted with different mapping files.

  • Before you start, you should make a note of all the sfms in your data that use the IPA93 font.
    • If you also use inline markers (such as “|” you should make a note of those as well).
  • Download the appropriate version of TECkit (Windows or Mac). These instructions assume a Windows installation.
  • Under C:Program Files create a folder called TECkit.
  • Unzip the file you downloaded to this folder.
  • Download and unzip the IPA93 mapping file (put it in the C:Program FilesTECkit folder).
  • Download and the IPA93-map.xml (below) file (put it in the C:Program FilesTECkit folder).
IPA93-map.xml file for use with SFConv
Lorna A. Priest, 2006-02-10
Download "IPA93-map.xml", XML document, 1KB [2611 downloads]

Open this file in a text editor. Because your data will have different sfms and inline markers you will probably need to adapt it.

To understand the file, look at this part of the code:

<sfMarkers escape=""
          chars="abcdefghijklmnopqrstuvwxyz_ABCDEFGHIJKLMNOPQRSTUVWXYZ"
          mapping="ISO-8859-1">
     <marker name="ph" mapping="silipa93"/>
     <marker name="pm" mapping="silipa93"/>
</sfMarkers>

The first line tells TECkit what characters form sfms and what mapping to use to convert the text of the markers to Unicode (i.e. I don’t want to change “p‿ to an IPA93 character). Then there is an exception list. So <marker name="ph" mapping="silipa93"/> is saying that for the sfm ph the silipa93 mapping file should be used. (It should also be used for pm.) You need to add a line for each of the sfms in your data that use the IPA93 encoding. If you have other sf markers that need a different mapping file, you can add another line, following this example, and use the appropriate sf marker (without the backslash) and mapping file name.

Note: SFconv converts all the data in the file at one sweep, not just the IPA fields that you define. The default mapping tells it what to do with the rest of the file.

Next we will look at inline markers.

Take a look at the following lines in the xml file.

<inlineMarkers escape="|" start="{" end="}"
          chars="abcdefghijklmnopqrstuvwxyz_ABCDEFGHIJKLMNOPQRSTUVWXYZ"
          mapping="ISO-8859-1">
     <marker name="en" mapping="ISO-8859-1"/>
     <marker name="ip" mapping="silipa93"/>
</inlineMarkers>

The first line tells TECkit what characters form inline markers and what mapping to use to convert the text of the markers to Unicode (i.e. I don’t want to change “|ip” to an IPA93 character). (You can change the “|” to another character, but do not try to use < or >.) Then there is an exception list. So <marker name="ip" mapping="silipa93"/> is saying that for text after the inline marker |ip the silipa93 mapping file should be used. You need to add a line for each of the inline markers in your data that use the IPA93 encoding. If you have other inline markers that need a different mapping file, you can add another line, following this example, and use the appropriate inline marker (without the “|”) and mapping file name.

Finally, to do the actual conversion from 8-bit to Unicode (UTF8), we need to go to the Command Prompt (MS-DOS) and type this (you will need to add the right path, depending on where you have put sfconv.exe):

"Program FilesTECkitsfconv" -8u -utf8 -c IPA93-map.xml -i filename.sfm -o filename-utf8.txt

filename-utf8.txt can be viewed in a Unicode text editor or in Word. Notice also that the actual markers should all be as expected, and the content of the ph,pm and |ip markers should be in Unicode IPA.

If you want to do a test to see if it has done an accurate job of converting you can convert the Unicode (UTF8) file back to 8-bit text:

"Program FilesTECkitsfconv" -u8 -utf8 -c IPA93-map.xml -i filename-utf8.txt -o test.sf

If you do this, test.sf should be identical to filename.sfm.

Note: Further information for SFconv can be found in the TECkit documentation (..TECkitdocumentationTECkit version 2.doc.pdf).


1 We are grateful to Bill Jancewicz for the OpenOffice portion of this solution.
2 Unfortunately, checking this box means that all “footnotes” are preserved from being converted as well. If you have footnotes Cancel this process.

© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.