This is an archive of the original scripts.sil.org site, preserved as a historical reference. Some of the content is outdated. Please consult our other sites for more current information: software.sil.org, ScriptSource, FDBP, and silfontdev



Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


Computers & Writing Systems

SIL HOME | SIL SOFTWARE | SUPPORT | DONATE | PRIVACY POLICY

You are here: Encoding > Conversion > Utilities
Short URL: https://scripts.sil.org/SILConverters22

SILConverters 2.2

Microsoft Word/COM support for TECkit, CC, and ICU

Bob Eaton, Mark Penny, 2006-04-28

Obsolete version

Please note that this product has been replaced by SILConverters 2.5 and you are strongly encouraged to use that product. This page is retained for those who, for whatever reasons, are unable to use the new version and require an older, unsupported, version.

Upgrade!

SILConverters 2.2 was recently released in FieldWorks 3.1. It has a major interface change which makes this version incompatible with version 2.0 or earlier (it can coexist with SILConverters 2.1). There are several new enhancements listed below.

This package provides a system-wide repository for encoding converters and transliterators (TECkit,  CC, or  ICU based) and a simple COM interface to select and use a converter from the repository. It is easy to use from VBA, C++, C#, or any .NET/COM enabled language. An included VBA macro provides a simple interface to manage and use the repository, making it easy to convert any file (e.g. SFM texts, lexicons, and even Word documents) to a different encoding based on one or more TECkit maps and/or CC tables. The macro interface also provides the ability to install and remove user-developed converters to the repository.

This package is fully integrated with SIL FieldWorks, providing a single system-wide registry of installed and available encoding converters. Additionally the package includes some extra utilities such as a clipboard converter for manipulating text between cut and paste operations.

Major changes to SIL Converters 2.2

Three New Transducer Engines:

  • Perl Expressions

The Perl Expression plug-in allows you to write Perl expressions to do text processing in EncConverter
client applications, such as Microsoft Word, FieldWorks (new in v3.1), and
AdaptIt (also new in v3.1). This plug-in is included in the basic package (i.e. SILConvertersSetup.exe below), but requires a separate Perl 5.8 distribution to be installed.

Note

The Perl plug-in has been tested with the following freely
available Perl distributions:  ActiveState Perl or  PXPerl

The following Perl snippets show the kind of expressions that are possible:

# Reverse a string of text

$strOut = reverse($strIn);

# turn all non-vowels into 'C' and vowels into 'V'
$strInOut =~ s/[^aeiou ]/C/g;
$strInOut =~ s/[aeiou]/V/g;
  • Python Scripts

The Python Script plug-in allows you to do text processing using Python functions in EncConverter
client applications. This plug-in is also included in the basic package, but it requires a separate Python 2.4 distribution to be installed.

Note

The Python plug-in has been tested with the following freely
available Python distributions:  ActiveState Python or  Python.org

The following Python snippets show the kind of
expressions that are possible:

# Turn a string of text into upper case

def ToUpper(s):

     return str.upper(s)


# get the Unicode code point names for input string

def UnicodeNames(u):

     r = u''
     for ch in u:
          r += '%s; ' % unicodedata.name(ch)
     return r
  • ICU Regular Expressions

In addition to the existing ICU Transliterator and Converter capabilities, v 2.2 also has ICU Regular Expression support. This
allows you to do text processing (primarily match and replace) with Regular Expressions in EncConverter client applications.

New (parallel) Compound Converter type: Primary-Fallback

New in v 2.2, is a Primary-Fallback converter, which allows you to specify two converters that operate as follows:

  • If the primary converter changes the input string, the word is returned as changed.
  • If the primary converter doesn’t change the string, then the fall-back converter is called.

This can be useful for complex transliteration
processes where a simple algorithmic process is not sufficient. The exceptional
(lexical) cases can be implemented as the primary converter and if a match is not
found, then the fallback (algorithmic) process is used.

Updated Clipboard EncConverter

New version supports:

  • Filtering based on Encoding ID (e.g. UNICODE or SIL-IPA93-2001), Implementation Type (e.g. SIL.PerlExpression), and/or Process Type (e.g. UnicodeEncodingConversion or Transliteration)
  • New menus to add, edit and delete converters in the System Repository (using the new Auto Configuration capability—see 5 below).

Updated SpellFixer application

New version supports:

  • Display Rulecommand: to quickly find Bad Spelling/Good Spelling pairs in the replacement table to modify them.
  • A warning is now displayed if you attempt to add a conflicting rule.
  • Correct Whole Document commands: four new ways to process Word documents:
    • word-by-word (as before)1
    • word-by-word but only for data in a particular font.
    • Same as above, but using any arbitrary EncConverter; not just a SpellFixer project converter.

Self-Configuration

Several new functions
have been added through which EncConverters plug-ins provide their own user-interface
for adding and configuring converters (e.g. to simplify adding EncConverter
support to different client applications).

Now client
applications can call a function to launch the Select Converterdialog:

Select Converter Dialog



This dialog displays all the converters available in
the System Repository and includes informational tool tips to give technical
details about the converters.

In addition to selecting a converter, another
function has been added to add a new converter based on its implementation or transduction type (c.f. clicking the  Add New  button
above). This function brings up a dialog allowing users to select the
transduction engine to use in creating a new converter:

Choose a Transduction Engine dialog



This dialog is used to launch implementation-specific
setup dialogs, such as this one for Perl Expressions:

Perl Expression Converter Setup dialog



Each such configurator has an About help tab explaining the details of the converter type and a Test Area tab for checking the converter with sample data.

Developers wanting to add support for EncConverters
using these new user-interface methods can see this webpage for
details and code snippets.

New MapMaker Helper

Also available is a helper utility, called MapMaker Helper, which can be configured to display up to 2 EncConverters to help with creating compound converters and/or to check on
the round-trip capability of a single (bi-directional) converter.

Here’s an image of the new utility with a bi-directional Hindi to Urdu transliterator. The third box shows the reverse direction to check for round-trip accuracy. Also, the bottom window gives the
Unicode names for the selected string which can be used to copy/paste into a TECkit map.

Note

The MapMaker Helper utility requires  Microsoft .NET framework 2.0 and a
Python distribution (discussed above) to be installed.

SIL Converters 2.2 MapMaker Helper dialog box



Bulk SFM File Converter

Also available is a utility, called SFM Converter, which can be used to convert one or more SFM documents using converters in the system repository.

Here’s an image of the new utility being used to convert an SFM document that is encoded with the non-Unicode font Annapurna.

SIL Converters 2.2 SFM Converter dialog box



This program can also be used to do conversions with Unicode-encoded SFM documents as well by choosing the encoding of the file as you open and save them.

Note

The SFM Converter utility requires  Microsoft .NET framework 2.0 to be installed.

Installation and configuration

Please note that the installation procedure for SILConverters is completely different from the old EncCnvtrs. It is also different from your typical Windows application; please note the following:

  • You will need Administrator privilege on the computer to install this software.
  • The system consists of separately downloadable components. The primary (and required) component contains the main setup files and programs. The optional components contain collections of converters that may be of interest. In the future, new converters will be made available by packaging them as additional optional components.
  • When you run one of the downloaded components, it unpacks itself into a shared installer folder (SILConverters22) and then launches the setup program. The setup program then lets you configure exactly what programs and converters you want to be available on your system.
  • Do not delete the shared installer folder (SILConverters22). You may, now or in the future, want to download additional converter modules, and they use this shared installer folder and setup program as well.

Step 1: Uninstall previous versions

You should uninstall EncCnvtrs version 1.5 and/or 2.0 before trying to install SILConverters. Both the 1.5 and 2.0 versions have uninstallers that are listed in Control Panel / Add or Remove Programs. However, several people have encountered problems in uninstalling them (perhaps due to some recent update in the Windows operating system). If your attempts to uninstall the "Encoding Converters" item(s) fail, then see Uninstallation Guide for EncCnvtrs 1.5 and 2.0. SILConverter 2.1 can coexist with SILConverter 2.2 and, in fact, you may run into problems with some of your FieldWorks application if you uninstall version 2.1.

Step 2: Get .NET

This package requires the  Microsoft .NET Framework Version 1.1 Redistributable. This is a large download (22M!) but it is supplied with a number of products so you may already have it. If you are unsure whether or not you have it, you can look at Control Panel / Add or Remove Programs. Alternatively, simply try the next step of the install — it will fail if .NET isn't available.

Step 3: Install core components

Download and run the SILConvertersSetup program to create the shared install folder and install the main suite of utilities. When you are prompted for the folder to save the files, enter (or browse to) your 'Downloads' folder (e.g. C:Downloads or My DocumentsMy Downloads). After unzipping the image, it will automatically start the SetupSC.exe program which is the main installer where you select what options you want installed:

SIL Converters Installer window



Use this configuration dialog to identify what add-ins, conversion maps, and document templates you want installed (or uninstalled). Some notes:

The  Commit  button both installs and uninstalls components. Whether an item will be installed or uninstalled depends on the state of the checkmark beside that item, which can have three possible values:

  • unchecked: the item is not installed or will be uninstalled.
  • grey: the item is installed.
  • checked: the item will be installed.

Many of the items have "tips" — if your cursor waits over an item a popup tip will tell you more about it.

The Word Document Templates list includes all templates you have installed on your system, not just those that came with the SIL Converters package. (In the above example you see Hallissy Highlights A4.) While you can use this dialog to uninstall any templates you have, I do not recommend it unless you have a backup somewhere. The templates supplied with the package are Data Conversion Macro and SpellFixer.

For Word Document Templates, the installer looks in (and can install into) two different folders:

folderusage
Templates Templates installed here are not automatically available but can be made available using Word's Tools / Templates and Add-Ins... dialog.
Startup Assuming Word's Tools / Macros / Security... / Security Level is set to Medium, templates installed here will be available automatically every time you start Word.

Review all the components in the installer dialog. Components that you want added to the installation should be checked, components that you want removed from the installation should be unchecked. When ready, click  Commit . When that is done, click  Cancel .

If you are installing Word Document Templates, you will get warning that the templates have been installed only for the current user. Any other users who also want access to the templates will need to run SetupSC.exe.

Step 4: Download and install additional components

You may now install additional components such as converter modules for specific encodings. To install these:

  • Click the download link for the desired add-in and run it (you may either save the file to your computer and then run it or run it directly). This will load the new items into the shared install directory (you didn't delete it did you2?) and automatically launch the installer.
  • When the Install options window opens, you need to select the additional converters you want to install.
  • You can manually install additional TECkit converters that do not have an installer by following the instructions here.

Reconfiguring

At any time you can reconfigure what modules are installed and available by running the SetupSC.exe program from the shared install folder created in Step 3.

Using SILConverters from Microsoft Word

Click Tools / Data conversion.... The user interface is relatively simple to master:

SIL Converters Word Macro



Notice three distinct areas:

  1. Conversion table details This is where you select one of the Conversion Tables from a list; you can also add new tables to the list.
  2. Scope of change Here you can restrict the scope of the conversion (apply changes to the whole document, a selection, particular backslash markers, or a specific font).
  3. Target Data Finally, you can optionally reformat the converted data by specifying a style or font.

For further information about using the Word macros: using Windows Explorer, locate a copy of the Data Conversion macro template. Assuming you still have the shared install directory around (from Step 3 of the Installation instructions), you can look in there for a copy. Right-click on the .DOT file and select Open from the context menu. The template file has documentation in it.



Downloads

FileDescription
SIL Converters 2.2 Core components and installer
Bob Eaton, 2006-06-16
Download "SILConvertersSetup.exe", Windows application, 12MB [3787 downloads]
This core package contains the following items:
  • EncConverters repository (Manages and provides an API to the collection of converters/transliterators)
  • The CC, TECkit, ICU, Perl and Python run-time conversion engines.
  • Clipboard EncConverter Add-in (Allows for using an EncConverter on clipboard data)
  • SpellFixer Add-in (Adds programmatic search and replace capability to Microsoft Word)
  • Data Conversion Macro Template (GUI for adding and using converters in Microsoft Word)
  • Few sample converter Maps & Tables:
    • Hex-Any (if you see stuff like 'बंसखट', this converter will turn it into correctly displaying Unicode--i.e. 'बंसखट')
    • UTF8UTF16 (if you see stuff like 'बंसखट', this converter will turn it into correctly displaying Unicode--i.e. 'बंसखट')
    • null: do nothing converter that is useful for globally changing data in one font to another with the Data Conversion Macro in Word.
    • NFC: normalize fully composed form of a Unicode string.
    • NFD: normalize fully decomposed form of a Unicode string.
The core download now also includes the applications and command line utilities from the TECkit package, such as, DropTEC.exe and the TECkit Mapping Editor.exe. These applications can be launched via menu items in the Start / All Programs / SIL Converters folder. The command line utilities from the TECkit package, such as, TxtConv.exe and SFconv.exe, are also installed in the C:Program FilesCommon FilesSIL menu folder.
SIL Converters 2.2 Indic Converters add-in
Bob Eaton, 2006-04-28
Download "SILConvertersIndic.exe", Windows application, 2MB [1935 downloads]

This additional component provides an "Indic-specific" set of converters and utilities for the SILConverters suite. This package contains support for:

  •  ITrans to Unicode converters for the following languages:
    • Hindi (e.g. "hindii" converts to हिन्दी)
    • Bengali
    • Gujarati
    • Telegu
    • Tamil
    • Kanada
    • Oriya
    • Malayam
  •  UTrans converters for Unicode presentation form of Urdu (Arabic)
  • ISCII to/from UNICODE converters (all Indic ranges)
  • Converters for the following fonts to both ISCII encoding and Unicode encoding based on the  Font2Iscii converter set:
    • Devpooja
    • Devpriya
    • DV-TTYogesh
    • DVB-TTYogesh
    • Sanskrit-98
    • Shusha
    • Mithi
    • DVBW-TTYogesh
    • AkrutiDev1
    • Ankit
    • Devlys
    • Kruti46
    • Naidunia
    • Telugu-Hemalatha
    • Telugu-Hemalathab
SIL Converters 2.2 SAG Indic Converters add-in
Bob Eaton, 2006-04-28
Download "SILConvertersSAGIndic.exe", Windows application, 70KB [1277 downloads]

TECkit encoding converters (related to the Devanagari range of Unicode) for the following Legacy Indic fonts:

  • Annapurna
  • Shusha
  • CDAC-ISFOC encoding (c.f. WinScript, DV-TTYogesh)
  • Unicode Devanagari IPA (phonetic transliteration)

SIL Converters 2.2 ICU Transliterators add-in
Bob Eaton, 2006-04-28
Download "SILConvertersIcuTransliterators.exe", Windows application, 33KB [1396 downloads]

This additional component provides a collection of ICU-based "Latin" transliterators for the following ranges of Unicode:

  • Devanagari-Latin
  • Bengali-Latin
  • Gujarati-Latin
  • Gurmukhi-Latin
  • Kannada-Latin
  • Malayalam-Latin
  • Oriya-Latin
  • Tamil-Latin
  • Telegu-Latin
  • Arabic-Latin
  • Cyrillic-Latin
  • Greek-Latin
  • Han-Latin
  • Hangul-Latin
  • Hebrew-Latin
  • Hiragana-Latin
  • Katakana-Latin
  • Jamo-Latin
  • Any-Latin

Note that these transliterators can be daisy-chained together to transliterate between non-Latin scripts. For example, chaining the 'Devanagari-Latin' transliterator (in the Forward direction) with the 'Tamil-Latin' transliterator (in the Reverse direction) gives a 'Devanagari-Tamil' transliterator).

Use the Data Conversion Macro, Add button to daisy-chain converters/transliterators together.
SIL Converters 2.2 FindPhone Converters add-in
Bob Eaton, 2006-04-28
Download "SILConvertersFindPhone.exe", Windows application, 36KB [1574 downloads]

Adds the following converters for dealing with FindPhone encoded data:

  • FindPhone>SAG IPA93
  • FindPhone>UNICODE
SIL Converters 2.2 MapMaker Helper add-in
Bob Eaton, 2006-04-28
Download "SILConvertersMapMakerHelper.exe", Windows application, 537KB [1581 downloads]
This new plug-in can be used to configure up to 2 EncConverters to be automatically executed on text typed into an edit box – i.e. a helper utility for developing TECkit maps and/or compound converters.
SIL Converters 2.2 SFM Converter add-in
Bob Eaton, 2006-06-23
Download "SILConvertersSFMConverter.exe", Windows application, 539KB [1506 downloads]

This new utility can be used to convert one or more Standard Format Marker (SFM) documents using converters in the system repository. Works for both Unicode and non-Unicode encoded SFM documents.

Requires  Microsoft .NET framework 2.0 (in addition to the version 1.1 required for the base SILConverters component).
Spelling Consistency Checker Word Document Template
Mark Penny, 2006-07-26
Download "Consistent Spelling Checker 152sc.zip", ZIP archive, 685KB [1515 downloads]

This download contains a document template that provide a simple way of working with data (in any language, and any script) in Microsoft Word documents, Plain Text files or any Toolbox database to:

  • Check consistency of spelling (semi-automatically) based on linguistic principles Apply global spelling changes:
    • to multiple documents which are currently open, or
    • by generating a CC table of changes to be applied to one or more plain text databases (such as Toolbox files)
  • Create a character inventory with frequency count
  • Create unique wordlists from one or more Word documents as:
    • a Word document table with frequency counts, or
    • a Toolbox (MDF-formatted) database for starting a lexicon

Warning

This tool is not a full-fledged spelling checking tool. It does not use language-specific dictionaries, and therefore knows nothing about the languages it checks. It is only a consistency checking tool based on phonological similarity, or sets of user-defined ambiguous characters.

Requirements

The Spelling Consistency Checker macro requires that all of the following software is installed:

  • Operating system: Windows XP (or later)
  • Word XP (or 2003)
  • SIL Converters 2.2 program including the basic converters (as it will use the following converters: UTF8<>UTF16 and the Any-Latin). In case the converters are not set up already, the macro will automatically add them for you. If you are using legacy-encoded data (such as WinScript, or Annapurna then you will also need an appropriate Legacy>Unicode converter set up manually.
  • The  Field Linguist's Toolbox (formerly called Shoebox)—if working from a Word-list
  • JumpToolbox (included in the zip file and must also be installed for the macros to work)
  •  Consistent Changes for Windows (CCW32.exe) if global changes are to be applied

Related resources

Encoding Conversion Frequently Asked Questions and Known Issues Lorna A. Priest, 2009-05-15
Frequently Asked Questions and Known Issues concerning conversion of legacy data to Unicode.

Structured data conversion Lorna Priest and David Rowe, 2003-03-03
By the end of this tutorial you should be able to convert (roundtrip) structured data and test it by bringing it into various applications. Any issues discovered in this process should be fixed in the mapping files.

SIL IPA93 Data Conversion Lorna A. Priest, 2009-02-16
Step-by-step instructions on how to convert Microsoft Word, text or Standard Format (sfm) documents that use SIL IPA93 fonts in order to use Unicode fonts.


1 The Correct Whole Document commands do not work on most legacy encoded fonts since Microsoft Word cannot detect word boundaries accurately except with Unicode data.
2 If you did delete the shared install directory that was created during Step 3, simply re-create it by repeating that step. When the installer dialog appears, press  Cancel  and try Step 4 again.

© 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Read our Privacy Policy. Contact us here.