|
Computers & Writing Systems
You are here: Encoding > Conversion > Utilities SIL Converters 2.5 documentation
Contents of this page
SIL Encoding Converters 2.5 SetupBesides being available from the SIL Software CD-ROMs, starting with SIL Converters version 2.5, there is a new web-based Master Installer that allows you to choose the features you want and then install them from the internet. The Master Installer is currently available from this link: http://downloads.sil.org/EncodingConverters/setup.exe. The Master Installer is a small program that runs on your machine that runs a series of installers:
For further instructions on running the Master Installer and for initial running of SIL Encoding Converters2.5 Setup of refer to the installation documentation SIL Converters 2.5 Installation. The following instructions assume you have already installed SIL Encoding Converters2.5 and want to understand and use additional features of the program. Initial dialog windowsWarning If you run the .msi installer without the Master Installer, you will be shown the following warning: Figure 2. Installer Warning
Note If after installing this way you have some functionality that isn’t working, try re-running the installation with the Master Installer first in order to insure that your system has the necessary prerequisites. Application maintenanceOnce Encoding converters is installed, the set up program displays an application maintenance screen that gives you the option to , or your installation.
Select FeaturesThe following section briefly describes most of the boxes in Figure 1 and links you to further information about how use the different utilities and applications for different text transduction applications. The information in this table is organized around the different components available in the SIL Converters Installer. Feature overviewFigure 3. Select Features Tree As you can see from Figure 3, there are four main categories of features that you can choose from when installing SIL Converters: SIL Converters’ client applicationThis feature node contains some of the programs at the top layer of Figure 1, which are generally of the most interest to end users. These programs and utilities allow you to convert text data (e.g. Word documents, SFM documents, XML Documents, data on the system clipboard) using the text processing capabilities provided by the various transduction engines at the bottom of Figure 1. Transduction EnginesThis feature node contains the different transduction engine components that provide text processing capabilities at the lowest layer of Figure 1. Most users should accept the defaults for this feature to insure that the proper transduction engines are installed. Otherwise, you must make sure you have the required transduction engines installed for the different text processing tasks you want to do. Examples:
Maps and TablesThis feature node contains several groupings of instances of conversion maps and tables (e.g. for TECkit and/or CC) which provide the input to the transduction engines (e.g. the SIL IPA93<>UNICODE map) A few of the subfeature items are useful for all users, such as the Basic Converters and ICU Transliterators sets. Otherwise, you can only install those converter sets you expect to need (e.g. based on your entity). If you would like to add a package of converters to the SIL Converters’ installer, contact . Additional TECkit applicationsSince the SILConverters installer installs TECkit (a subfeature of the Transduction Engine feature node discussed above), this feature node adds the rest of the content of the TECkit download from the TECkit site (i.e. the documentation and other TECkit client applications). A new TECkit map Unicode Editor assists in the creation of TECkit maps available from this feature node. The following sections describe the sub-features available in each of these four nodes. SIL Converters’ client applicationsFigure 4. SIL Converters’ Client Applications This installer installs the following of SIL Converters client applications directly (see Figure 1). The FieldWorks and AdaptIt client applications have separate install programs. Bulk SFM ConverterUse this application to convert the data in Standard Format Marker (SFM) fields using converters from the EncConverters’ repository and to convert the encoding of data in Shoebox, Toolbox, and Paratext (SFM) documents. You can also open multiple SFM documents for processing at the same time. To use
Figure 5. SFM File Converter Clipboard Encoding ConverterAfter starting up the Clipboard EncConverter, click the icon on the Windows Task Bar to convert text copied to the Windows clipboard. To start it up Clipboard Encoding Converter is a untilty that you access from the Windows Task Bar. To use it you need to first start it up.
To use Clipboard Encoding Converter can be used in two ways:
Figure 6. EncConverter Clipboard Mode popup To convert text using the Clipboard Mode
SpellFixer Mode
XML Data ConverterUse this application to convert the data (attributes or elements) in an XML document using converters from the EncConverters’ repository for example,
To use
Figure 7. XML Data Converter Mappings MS Word convertersYou can use SILConverters directly in MS Word. The converters are macros contained in three Microsoft Word document templates (DOTs). These macros use the EncConverters repository to accomplish different tasks.
If you select the WordDOTs feature node, the SILConverters’ installer will put these templates into your Templates folder (normally C:Documents and Settings<user>Application DataMicrosoftTemplates). To use
Note If multiple users on the machine want to use the document template, you need to manually move the .DOT files to some common location and each user will need to browse for them individually in C:Documents and Settings<user>Application DataMicrosoftWordSTARTUP). For all users, put it in the global startup folder (e.g. C:Program FilesMicrosoft OfficeOFFICE11STARTUP). . If you want one or more of these document templates to start up automatically when Word starts, move them either to the current user’s Startup folder (i.e.Data Conversion MacroUse the Data Conversion Macro to convert text in any arbitrary Word document based on Font name, Style, or even the current selection using converters from the EncConverters’ repository. It also supports SFM documents. Open the document template for more instructions. Figure 8. Data Conversion Macro dialog SpellingFixerUse the SpellingFixer document template to correct misspelled words or make certain orthographic changes based on a user-defined database of bad-good spelling pairs. This is particular useful when you want
Figure 9. Enter correction rules dialog Once you have a database of such spelling fixes (or consistent changes), use one of the menu commands to go through all the words in the document to search for misspelled words. See the document template for instructions. Consistency Spelling CheckerUse the Consistency Spelling Checker document template for a simple way of working with data (in any language, and any script) in Microsoft Word documents, Plain Text files or any Toolbox database to:
Figure 10. Spelling inconsistency parameter dialog Note This tool is not a full-fledged spelling checking tool. It does not use language-specific dictionaries, and therefore knows nothing about the languages it checks. It is only a consistency checking tool based on phonological similarity, or sets of user-defined ambiguous characters. Prerequisites The Spelling Consistency Checker macro requires that you install this software:
SIL Converters’ Transduction EnginesSeveral of the transduction engines in Figure 1 are provided by the EncConverters’ repository object itself (i.e. the code page converter, the AdaptIt Knowledge Base Lookup, and the Compound and Primary-Fallback meta converters) and are always available. The rest depend on external programs (SIL and other Open Source programs) and installation is optional, depending on your need. Most end users will not need to concern themselves with these details except to be sure that the necessary transduction engine is installed for the converters they want to use. Chances are that someone in your entity has already created a map file that you can use to convert the encoding of your data. In this situation, you need to be sure that you install the proper transduction engine required by the map or table that implements the conversion you want. Figure 11. Optional SIL Converters’ Transduction Engines TECkitOther applications use TECkit, a low-level toolkit, to perform encoding conversions (e.g., when importing legacy data into a Unicode-based application). The primary component of the TECkit package is a library that performs conversions. This is the “TECkit engine”. The engine relies on mapping tables in a specific binary format (see TECkit documentation). A compiler creates such tables from a human-readable mapping description (a simple text file). In EncConverters, you can select either the compiled *.tec file or the uncompiled, human-readable *.map) to be the converter. If you choose the latter, EncConverters will automatically compile an out-of-date .tec file when it is used to convert data. See Adding converters: TECkit map below for details about adding TECkit maps to the system repository. Consistent Changes (CC)Use Consistent Change tables to find all occurrences of specified characters, words, or phrases in a string of text, and then change them in a consistent way. The change may be done in every occurrence or only when certain conditions are met. CC is like the find-and-replace feature in a text editor, but much more powerful. It allows you
SpellFixer is also available. This is a user-friendly graphical user interface for creating consistent change tables. This interface is primarily available via the SpellFixer.dot Microsoft Word document template mentioned above in SILConverters’ client applications. See Adding converters: Consistent Changes (CC) below for details about adding CC tables to the system repository. International Components for Unicode (ICU) 3.4Three distinct EncConverters-related features as well as other features of ICU used by other client are applications that must be installed as a unit. For SILConverters, three transduction engines are included in this feature:
Perl Expressions 5.8.7The Perl Expressions 5.8.7 transduction engine allows you to write Perl expressions to do text processing in EncConverter client applications. Note This feature requires installation of a separate Perl 5.8.7 distribution to be installed. The Perl plug-in has been tested with the following freely available Perl distributions: http://www.activestate.com/solutions/perl/ ActiveState Perl or PXPerl See below for a known issue with the PXPerl distribution. Also note that this plug-in will not work (yet) with the most recent v5.8.8 distribution. Python Script Functions 2.4The Python Script Functions 2.4 transduction engine allows you to do text processing using Python functions in EncConverter client applications. Note This feature requires a separate Python 2.4 distribution to be installed. The Python plug-in has been tested with the following freely available Python distributions: ActiveState Python or Python.org. Note This plug-in will not work (yet) with the most recent v2.5 distribution. SIL Converters’ Maps and TablesMost end-users are interested only in a small number of encodings. Typically, computer support people have created TECkit maps and/or CC tables for the various encodings used in each entity, alleviating most end-users from having to create their own maps and tables. Because there are hundreds of possible encoding converters and transliterators that different end-users may be interested in, they are packaged into logically-related groups of converters and are available via a two-step process. Steps
Note Installing maps and tables onto your computer with the SILConverters’ installer (step 1 above) will not make them available to EncConverters’ client applications unless you explicitly add them to the EncConverters’ repository using the Converter Installer or some other mechanism (see Adding converters). Figure 12. Available optional maps and tables The following sections gives the details about fonts and encodings for different maps and tables: Basic ConvertersConverters and Transliterators common to all SIL. This includes the following:
ICU TransliteratorsConfiguration information for the following ICU transliterators are for Unicode-encodings only. Note These are not the only transliterators available via the ICU Transliterator transduction engine, but are only a few of the pre-defined latinizing (or romanizing) transliterators that can be useful in different client applications for different ranges of Unicode.
Note These transliterators can be daisy-chained together to transliterate between non-Latin scripts using a Compound meta-converter. For example, chaining the ‘Devanagari-Latin’ transliterator (in the Forward direction) with the ‘Arabic-Latin’ transliterator (in the Reverse direction) gives a ‘Devanagari-Arabic’ transliterator. FindPhone to IPA convertersAdds the following converters for dealing with FindPhone encoded data:
SAG IndicContains encoding converter map(s) for the following encoding/font:
CameroonContains encoding converter map(s) for the following encoding/fonts:
West AfricaContains encoding converter map(s) for the following encoding/fonts:
Eastern Congo GroupContains encoding converter map(s) for the following encoding/fonts:
NLCI (India)Contains encoding converter map(s) for the following encoding/font:
Additional TECkit applicationsTECkit Map Unicode EditorThe TECkit map Unicode Editor is one more EncConverters’ client application mentioned in Figure 1. Use this program to develop TECkit maps for encoding conversion or other text processing applications (e.g. Transliteration). Steps
Figure 13. TECkit Map Unicode Editor
Tips
Final dialog windows
Adding Converters to the System RepositoryThere are two primary ways of adding converters to the System Repository, by using either the
Converter InstallerIf the converter you want to install into the system repository comes as part of the Maps and Tables features in the SILConverters installer (e.g. the SIL IPA93<>UNICODE converter that comes as part of the Basic Converters package), you can install it into the system repository by running the Converter Installer application. How to get there
Figure 14. Converter Installer Installing converters
For detailed instructions see the Converter Installer section in the installation documentation Choose a Transduction Engine dialog boxIf you have your own converter map (e.g. created with the TECkit map Unicode Editor) or one given to you not as part of an installer feature, you can add it to the system repository via the dialog box.Figure 15. Choose Transduction Engine dialog box
How to get there
Figure 16. Select Converter dialog Transduction Engine DetailsTECkit map
Figure 17. TECkit Setup
Consistent Changes (CC)
Result: The CC table Setup dialog will be displayed: Figure 18. CC Table Setup
Tip: If it expects Unicode-encoded data, select that option or your data may be incorrectly converted. For Non-Unicode (byte) data, the default system code page will be used to convert your data when necessary.
Tip: Though primarily a Microsoft Word-based tool, you can use The SpellFixer application to create a CC table. Use the SpellFixer graphical user-interface to configure Bad Spelling and Good Spelling couplets, which then are put into a CC table. The Microsoft Word document template also has macros for processing the text in a file in a word-by-word manner so you can use it in a Find First/Next fashion to correct spelling errors. The SpellFixer.dot file has further usage information.
Note You do not need to add a SpellFixer project to the System Repository, since it will be added automatically by the Project editor.
ICU Transliterator
Result: The ICU Transliterator Setup dialog will be displayed: Figure 19. ICU Transliterator Setup
ICU Converters
The ICU Converter Setup dialog will be displayed: Figure 20. ICU Converter Setup
Regular Expression Find and Replace (ICU)ICU's Regular Expressions package provides applications with the ability to apply regular expression matching to Unicode string data. The regular expression patterns and behavior are based on Perl's regular expressions. See http://icu.sourceforge.net/userguide/regexp.html for more details on the syntax of ICU Regular Expressions.
The Regular Expression Find and Replace (ICU) Setup dialog will be displayed: Figure 21. ICU Regular Expression Setup
Figure 22. Commonly used regular expressions pop-up
Regular Expression Metacharacters
Regular Expression Operators
Replacement TextThe replacement text for find-and-replace operations may contain references to capture-group text from the find. References are of the form $n, where n is the number of the capture group.
Perl ExpressionNote This feature requires installation of a separate Perl 5.8.7 distribution to be installed. The Perl plug-in has been tested with the following freely available Perl distributions: http://www.activestate.com/solutions/perl/ ActiveState Perl or PXPerl See the Unable to add converters with PXPerl installed in the installation document in for a known issue with the PXPerl distribution.
The Perl Expression Setup dialog will be displayed: Figure 23. Perl Expression Setup
Python Script FunctionNote This feature requires a separate Python 2.4 distribution to be installed. The Python plug-in has been tested with the following freely available Python distributions: ActiveState Python or Python.org. If you want to add a Python script function converter to the system repository,
The Python Script Setup dialog will be displayed: Figure 24. Python Script Setup
def ChangeLanguage(sLang, uI): if not isinstance(uI, unicode): raise UnicodeError(u'Input Data not Unicode! (%s)' % uI)0 else: if sLang == u'Chinese': # do some Chinese processing and put result in uO uO = ProcessChinese(uI) return uO The field would be enabled and you could enter the fixed string, , in order to trigger the script properly.Note If you have more than one additional parameter, the static strings should be separated by a semicolon (i.e. ";").
The Setup tab also has the following options:
Note If a particular function allows additional (static) parameters, then the proper order of parameters will also be shown in this window.
AdaptIt Knowledge Base Lookup ConverterThe AdaptIt Knowledge Base Lookup transduction engine is contained in the EncConverters assembly itself and therefore is always available without requiring an installer selection. This transducer allows you to do lookups on words in either the adaptation or glossing Knowledge Base of an AdaptIt project. To add an AdaptIt Knowledge Base Lookup converter to the system repository:
The AdaptIt Knowledge Base Converter Setup dialog will be displayed: Figure 25. AdaptIt Knowledge Base Lookup Converter Setup
Note For an AdaptIt Transliteration Project, the transliteration data will be in the normal project Knowledge Base file; not the Glossing Knowledge Base. However, it is possible to access a Glossing Knowledge Base if desired. Note that if you access an Adaptation Knowledge Base (i.e. from an AdaptIt project used to adapt texts from one language to another—which most likely will contain ambiguities), then the converter will return a string containing all the ambiguities for the given lookup word in the Ample ambiguity format (i.e. %count%form1%form2%...%). For example, if your AdaptIt Knowledge Base has an ambiguity for the word /से/ 'from, with' in the Source language, which sometimes means /ते/ 'from' and sometimes /कन्ने/ 'with' in the target language, then if you attempt to process the word /से/ with this converter, it will return the string /%2%ते%कन्ने%/. If you have such values in a document readable by Microsoft Word, then you can use the Word Pick Document Template to simplify disambiguating these tokens.
Compound “daisy-chained” convertersTwo final converter types to be discussed are actually meta converter types; that is, they allow you to combine two or more existing converters in the system repository in a serial or parallel fashion. The Compound Converter type can be used to combine 2 or more converters together in a serial fashion so that the output of one step will become the input to the next step automatically. This can be helpful when you have multiple, different conversions to apply to your data to get it in the ultimate form you need without requiring separate conversions. For example, you may have one converter that goes from FindPhone IPA to SIL-IPA93, and another converter that converts from SIL-IPA93 to Unicode IPA. In order to perform the end-to-end conversion from FindPhone IPA to Unicode IPA, you can create a daisy-chain of the two existing converters (a “virtual converter”) so that the data is converted in one step. Note When creating or using a compound converter, then all n+1 converters must be in the system repository (i.e. the n steps plus the compound converter itself). If you create a compound converter and then subsequently delete one of the steps, it will not work. To add a Compound converter to the system repository:
The Compound (daisy-chained) Converter Setup dialog will be displayed: Figure 26. Compound Converter Setup
Note Compound converters may not be temporary converters.
Note The converter friendly name you enter here is for the Compound converter itself, which is distinct from the names of the converter steps. For Compound converters, the default name will a concatenation of the individual steps’ names. However, you can change it to something more meaningful if desired (e.g. “Devanagari to Arabic”).
Primary-Fallback Compound ConverterThe Primary-Fallback Compound Converter type allows you to specify two existing converters: one to be a primary, and the other, a fallback converter. The configured primary converter is first called to do a conversion. If the primary converter doesn’t change the data, then, and only then, the fallback converter is called. This can be useful for transliteration where a character-based transliterator (e.g. TECkit, ICU, or CC) does most of the work, but certain words (or character sequences) are otherwise unpredictable from the context. In this case, you might want a lexicon-based approach to supply the special case forms. In this scenario, you would configure the lexicon-based transliterator (e.g. a SpellFixer CC table or an AdaptIt Knowledge Base Lookup converter) to be the primary converter and the character-based transliterator as the fallback converter. If the text isn’t modified by the primary converter (i.e. if it isn’t an exception), then the fallback converter is called to do the conversion. To add a Primary-Fallback converter to the system repository:
The Primary-Fallback Converter Setup dialog will be displayed: Figure 27. Primary-Fallback Converter Setup
Note Primary-Fallback converters may not be temporary converters.
Figure 28. Enter Converter Name dialog Note The converter friendly name you enter here is for the combined Primary-Fallback converter itself, which is distinct from the names of the Primary and Fallback converters.
Saving the converter in the System RepositoryOn any of the Transduction Engine Setup dialogs, by default, if you click or , the configured converter will be returned to the client application as a temporary converter; once the client application (e.g. FieldWorks or Word) is closed or releases the converter, it will no longer be available. If you want the converter to be permanently available to client applications, then you must explicitly add it to the System Repository using the button (or the button when editing a map).
Figure 29. Enter Converter Name dialog box
Figure 30. Advanced Configuration dialog box Though these values are not necessary for the operation of the converter, they can be helpful to various client applications. For example, the Clipboard EncConverter can be configured to filter the list of displayed converters based on the Encoding Name and/or the Transduction Type configured here. Known IssuesSIL Converters has the option of displaying all fonts in a Word document. However, it will only show you the fonts that are installed on your computer. It does not warn you of any fonts that have been used in the document but are not currently installed on your computer. You can find out what fonts the Word document needs that are not on your © 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page. |