You are here: Encoding > Conversion > Utilities
Short URL: http://scripts.sil.org/SILConverters22
Microsoft Word/COM support for TECkit, CC, and ICU
Please note that this product has been replaced by SILConverters 2.5 and you are strongly encouraged to use that product. This page is retained for those who, for whatever reasons, are unable to use the new version and require an older, unsupported, version.
SILConverters 2.2 was recently released in FieldWorks 3.1. It has a major interface change which makes this version incompatible with version 2.0 or earlier (it can coexist with SILConverters 2.1). There are several new enhancements listed below.
This package provides a system-wide repository for encoding converters and transliterators (TECkit, CC, or ICU based) and a simple COM interface to select and use a converter from the repository. It is easy to use from VBA, C++, C#, or any .NET/COM enabled language. An included VBA macro provides a simple interface to manage and use the repository, making it easy to convert any file (e.g. SFM texts, lexicons, and even Word documents) to a different encoding based on one or more TECkit maps and/or CC tables. The macro interface also provides the ability to install and remove user-developed converters to the repository.
This package is fully integrated with SIL FieldWorks, providing a single system-wide registry of installed and available encoding converters. Additionally the package includes some extra utilities such as a clipboard converter for manipulating text between cut and paste operations.
Major changes to SIL Converters 2.2
Three New Transducer Engines:
The Perl Expression plug-in allows you to write Perl expressions to do text processing in EncConverter
client applications, such as Microsoft Word, FieldWorks (new in v3.1), and
AdaptIt (also new in v3.1). This plug-in is included in the basic package (i.e. SILConvertersSetup.exe below), but requires a separate Perl 5.8 distribution to be installed.
The Perl plug-in has been tested with the following freely
available Perl distributions: ActiveState Perl or PXPerl
The following Perl snippets show the kind of expressions that are possible:
# Reverse a string of text
$strOut = reverse($strIn);
# turn all non-vowels into 'C' and vowels into 'V'
$strInOut =~ s/[^aeiou ]/C/g;
$strInOut =~ s/[aeiou]/V/g;
The Python Script plug-in allows you to do text processing using Python functions in EncConverter
client applications. This plug-in is also included in the basic package, but it requires a separate Python 2.4 distribution to be installed.
The following Python snippets show the kind of
expressions that are possible:
# Turn a string of text into upper case
# get the Unicode code point names for input string
r = u''
for ch in u:
r += '%s; ' % unicodedata.name(ch)
In addition to the existing ICU Transliterator and Converter capabilities, v 2.2 also has ICU Regular Expression support. This
allows you to do text processing (primarily match and replace) with Regular Expressions in EncConverter client applications.
New (parallel) Compound Converter type: Primary-Fallback
New in v 2.2, is a Primary-Fallback converter, which allows you to specify two converters that operate as follows:
- If the primary converter changes the input
string, the word is returned as changed.
- If the primary converter doesn’t change the
string, then the fall-back converter is called.
This can be useful for complex transliteration
processes where a simple algorithmic process is not sufficient. The exceptional
(lexical) cases can be implemented as the primary converter and if a match is not
found, then the fallback (algorithmic) process is used.
Updated Clipboard EncConverter
New version supports:
- Filtering based on Encoding ID (e.g. UNICODE
Implementation Type (e.g. SIL.PerlExpression), and/or Process
Type (e.g. UnicodeEncodingConversion or Transliteration)
- New menus to add, edit and delete converters in
the System Repository (using the new Auto Configuration capability—see 5 below).
Updated SpellFixer application
New version supports:
- Display Rule command: to quickly find Bad Spelling/Good Spelling pairs in the replacement
table to modify them.
- A warning is now displayed if you attempt to add
a conflicting rule.
- Correct Whole Document commands: four new ways
to process Word documents:
- word-by-word (as before)
- word-by-word but only for data in a particular
- Same as above, but using any arbitrary
EncConverter; not just a SpellFixer project converter.
Several new functions
have been added through which EncConverters plug-ins provide their own user-interface
for adding and configuring converters (e.g. to simplify adding EncConverter
support to different client applications).
applications can call a function to launch the Select Converter dialog:
This dialog displays all the converters available in
the System Repository and includes informational tool tips to give technical
details about the converters.
In addition to selecting a converter, another
function has been added to add a new converter based on its implementation or transduction type (c.f. clicking the button
above). This function brings up a dialog allowing users to select the
transduction engine to use in creating a new converter:
This dialog is used to launch implementation-specific
setup dialogs, such as this one for Perl Expressions:
Each such configurator has an
help tab explaining the details of the converter type and a tab for checking the converter with sample data.
Developers wanting to add support for EncConverters
using these new user-interface methods can see this webpage for
details and code snippets.
New MapMaker Helper
Also available is a helper utility, called MapMaker Helper, which can be configured to display up to 2 EncConverters to help with creating compound converters and/or to check on
the round-trip capability of a single (bi-directional) converter.
Here’s an image of the new utility with a bi-directional Hindi to Urdu transliterator. The third box shows the reverse direction to check for round-trip accuracy. Also, the bottom window gives the
Unicode names for the selected string which can be used to copy/paste into a TECkit map.
Bulk SFM File Converter
Also available is a utility, called SFM Converter, which can be used to convert one or more SFM documents using converters in the system repository.
Here’s an image of the new utility being used to convert an SFM document that is encoded with the non-Unicode font Annapurna.
This program can also be used to do conversions with Unicode-encoded SFM documents as well by choosing the encoding of the file as you open and save them.
Installation and configuration
Please note that the installation procedure for SILConverters is completely different from the old EncCnvtrs. It is also different from your typical Windows application; please note the following:
- You will need Administrator privilege on the computer to install this software.
- The system consists of separately downloadable components. The primary (and required) component contains the main setup files and programs. The optional components contain collections of converters that may be of interest. In the future, new converters will be made available by packaging them as additional optional components.
- When you run one of the downloaded components, it unpacks itself into a shared installer folder (SILConverters22) and then launches the setup program. The setup program then lets you configure exactly what programs and converters you want to be available on your system.
- Do not delete the shared installer folder (SILConverters22). You may, now or in the future, want to download additional converter modules, and they use this shared installer folder and setup program as well.
Step 1: Uninstall previous versions
You should uninstall EncCnvtrs version 1.5 and/or 2.0 before trying to install SILConverters. Both the 1.5 and 2.0 versions have uninstallers that are listed in Uninstallation Guide for EncCnvtrs 1.5 and 2.0. SILConverter 2.1 can coexist with SILConverter 2.2 and, in fact, you may run into problems with some of your FieldWorks application if you uninstall version 2.1.
. However, several people have encountered problems in uninstalling them (perhaps due to some recent update in the Windows operating system). If your attempts to uninstall the "Encoding Converters" item(s) fail, then see
Step 2: Get .NET
This package requires the Microsoft .NET Framework Version 1.1 Redistributable. This is a large download (22M!) but it is supplied with a number of products so you may already have it. If you are unsure whether or not you have it, you can look at . Alternatively, simply try the next step of the install — it will fail if .NET isn't available.
Step 3: Install core components
Download and run the SILConvertersSetup program to create the shared install folder and install the main suite of utilities. When you are prompted for the folder to save the files, enter (or browse to) your 'Downloads' folder (e.g. C:Downloads or My DocumentsMy Downloads). After unzipping the image, it will automatically start the SetupSC.exe program which is the main installer where you select what options you want installed:
Use this configuration dialog to identify what add-ins, conversion maps, and document templates you want installed (or uninstalled). Some notes:
button both installs and uninstalls components. Whether an item will be installed or uninstalled depends on the state of the checkmark beside that item, which can have three possible values:
- unchecked: the item is not installed or will be uninstalled.
- grey: the item is installed.
- checked: the item will be installed.
Many of the items have "tips" — if your cursor waits over an item a popup tip will tell you more about it.
The Word Document Templates list includes all templates you have installed on your system, not just those that came with the SIL Converters package. (In the above example you see Hallissy Highlights A4.) While you can use this dialog to uninstall any templates you have, I do not recommend it unless you have a backup somewhere. The templates supplied with the package are Data Conversion Macro and SpellFixer.
For Word Document Templates, the installer looks in (and can install into) two different folders:
||Templates installed here are not automatically available but can be made available using Word's
||Assuming Word's Medium, templates installed here will be available automatically every time you start Word.
is set to |
Review all the components in the installer dialog. Components that you want added to the installation should be checked, components that you want removed from the installation should be unchecked. When ready, click
. When that is done, click .
If you are installing Word Document Templates, you will get warning that the templates have been installed only for the current user. Any other users who also want access to the templates will need to run SetupSC.exe.
Step 4: Download and install additional components
You may now install additional components such as converter modules for specific encodings. To install these:
- Click the download link for the desired add-in and run it (you may either save the file to your computer and then run it or run it directly). This will load the new items into the shared install directory (you didn't delete it did you?) and automatically launch the installer.
- When the
window opens, you need to select the additional converters you want to install.
- You can manually install additional TECkit converters that do not have an installer by following the instructions here.
At any time you can reconfigure what modules are installed and available by running the SetupSC.exe program from the shared install folder created in Step 3.
Using SILConverters from Microsoft Word
. The user interface is relatively simple to master:
Notice three distinct areas:
- Conversion table details This is where you select one of the Conversion Tables from a list; you can also add new tables to the list.
- Scope of change Here you can restrict the scope of the conversion (apply changes to the whole document, a selection, particular backslash markers, or a specific font).
- Target Data Finally, you can optionally reformat the converted data by specifying a style or font.
For further information about using the Word macros: using Windows Explorer, locate a copy of the Data Conversion macro template. Assuming you still have the shared install directory around (from Step 3 of the Installation instructions), you can look in there for a copy. Right-click on the .DOT file and select from the context menu. The template file has documentation in it.
||This core package contains the following items:
The core download now also includes the applications and command line utilities from the TECkit package, such as, DropTEC.exe and the TECkit Mapping Editor.exe. These applications can be launched via menu items in the folder. The command line utilities from the TECkit package, such as, TxtConv.exe and SFconv.exe, are also installed in the C:Program FilesCommon FilesSIL menu folder.
- EncConverters repository (Manages and provides an API to the collection of converters/transliterators)
- The CC, TECkit, ICU, Perl and Python run-time conversion engines.
- Clipboard EncConverter Add-in (Allows for using an EncConverter on clipboard data)
- SpellFixer Add-in (Adds programmatic search and replace capability to Microsoft Word)
- Data Conversion Macro Template (GUI for adding and using converters in Microsoft Word)
- Few sample converter Maps & Tables:
- Hex-Any (if you see stuff like 'बंसखट', this converter will turn it into correctly displaying Unicode--i.e. 'बंसखट')
- UTF8UTF16 (if you see stuff like 'à¤¬à¤‚à¤¸à¤–à¤Ÿ', this converter will turn it into correctly displaying Unicode--i.e. 'बंसखट')
- null: do nothing converter that is useful for globally changing data in one font to another with the Data Conversion Macro in Word.
- NFC: normalize fully composed form of a Unicode string.
- NFD: normalize fully decomposed form of a Unicode string.
This additional component provides an "Indic-specific" set of converters and utilities for the SILConverters suite. This package contains support for:
- ITrans to Unicode converters for the following languages:
- Hindi (e.g. "hindii" converts to हिन्दी)
- UTrans converters for Unicode presentation form of Urdu (Arabic)
- ISCII to/from UNICODE converters (all Indic ranges)
- Converters for the following fonts to both ISCII encoding and Unicode encoding based on the Font2Iscii converter set:
TECkit encoding converters (related to the Devanagari range of Unicode) for the following Legacy Indic fonts:
- CDAC-ISFOC encoding (c.f. WinScript, DV-TTYogesh)
- Unicode Devanagari IPA (phonetic transliteration)
This additional component provides a collection of ICU-based "Latin" transliterators for the following ranges of Unicode:
Note that these transliterators can be daisy-chained together to transliterate between non-Latin scripts. For example, chaining the 'Devanagari-Latin' transliterator (in the Forward direction) with the 'Tamil-Latin' transliterator (in the Reverse direction) gives a 'Devanagari-Tamil' transliterator).
Use the Data Conversion Macro, Add button to daisy-chain converters/transliterators together.
Adds the following converters for dealing with FindPhone encoded data:
- FindPhone>SAG IPA93
||This new plug-in can be used to configure up to 2 EncConverters to be automatically executed on text typed into an edit box – i.e. a helper utility for developing TECkit maps and/or compound converters.
This new utility can be used to convert one or more Standard Format Marker (SFM) documents using converters in the system repository. Works for both Unicode and non-Unicode encoded SFM documents.
Requires Microsoft .NET framework 2.0 (in addition to the version 1.1 required for the base SILConverters component).
This download contains a document template that provide a simple way of working with data (in any language, and any script) in Microsoft Word documents, Plain Text files or any Toolbox database to:
- Check consistency of spelling (semi-automatically) based on linguistic principles
Apply global spelling changes:
- to multiple documents which are currently open, or
- by generating a CC table of changes to be applied to one or more plain text databases (such as Toolbox files)
- Create a character inventory with frequency count
- Create unique wordlists from one or more Word documents as:
- a Word document table with frequency counts, or
- a Toolbox (MDF-formatted) database for starting a lexicon
This tool is not a full-fledged spelling checking tool. It does not use language-specific dictionaries, and therefore knows nothing about the languages it checks. It is only a consistency checking tool based on phonological similarity, or sets of user-defined ambiguous characters.
The Spelling Consistency Checker macro requires that all of the following software is installed:
- Operating system: Windows XP (or later)
- Word XP (or 2003)
- SIL Converters 2.2 program including the basic converters (as it will use the following converters: UTF8<>UTF16 and the Any-Latin). In case the converters are not set up already, the macro will automatically add them for you. If you are using legacy-encoded data (such as WinScript, or Annapurna then you will also need an appropriate Legacy>Unicode converter set up manually.
- The Field Linguist's Toolbox (formerly called Shoebox)—if working from a Word-list
- JumpToolbox (included in the zip file and must also be installed for the macros to work)
- Consistent Changes for Windows (CCW32.exe) if global changes are to be applied
Encoding Conversion Frequently Asked Questions and Known Issues
Frequently Asked Questions and Known Issues concerning conversion of legacy data to Unicode.
Structured data conversion
By the end of this tutorial you should be able to convert (roundtrip) structured data and test it by bringing it into various applications. Any issues discovered in this process should be fixed in the mapping files.
SIL IPA93 Data Conversion
Step-by-step instructions on how to convert Microsoft Word, text or Standard Format (sfm) documents that use SIL IPA93 fonts in order to use Unicode fonts.
© 2003-2017 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.