Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | CONTACT US

You are here: Encoding > Conversion > Utilities > TECkit
Short URL: http://scripts.sil.org/TECkit

TECkit

a Text Encoding Conversion toolkit

TECkit is a low-level toolkit intended to be used by other applications that need to perform encoding conversions (e.g., when importing legacy data into a Unicode-based application). The primary component of the TECkit package is therefore a library that performs conversions; this is the "TECkit engine". The engine relies on mapping tables in a specific binary format (for which documentation is available); there is a compiler that creates such tables from a human-readable mapping description (a simple text file).

New version, 7 April 2008

Version 2.5.1 of the TECkit conversion library and mapping compiler is now available at the Downloads page. This version supports the Unicode 5.1 character repertoire, and also includes minor bug fixes.

To facilitate the development and testing of mapping tables for TECkit, several applications are also included in the current package; these include simple tools for applying conversions to plain-text and Standard Format files, as well as both command-line and simple GUI versions of the TECkit compiler. However, it is not intended that these tools will be the primary means by which end users perform conversions, and they have not been designed, tested, and debugged to the extent that general-purpose applications should be.

There are some tutorial materials available to help in learning to use the TECkit package.

TECkit: Recent changes
This page notes significant changes made to the TECkit package and website. These notes are not guaranteed to be exhaustive!

TECkit Downloads
Here is where the current version of TECkit can be obtained. Version 2.5.1, released 7 April 2008, updates the supported character repertoire (for normalization and character names) to Unicode 5.1.

TECkit: Notes for Developers
These notes should help with Frequently Asked Questions from Developers.

SILConverters 3.1.1
This package provides tools through which you can change the encoding, font, and/or script of text in Microsoft Word and other Office documents, XML documents, and SFM text and lexicon documents. It also installs a system-wide repository to manage your encoding converters and transliterators (TECkit, CC, ICU, Perl, or Python-based, as well as support for adding custom transduction engines).

Note

The TECkit package is copyright ©2002-2008 SIL International. It is being made available as free software but without any warranty; see the license for more information.


Backlinks (20 most popular; affiliated sites and popular search engines removed)
 http://home.att.net/~jameskass/scriptlinks.htm
 http://www.bauhahnm.clara.net/Khmer/Welcome.html
 http://www.crosswire.org/pipermail/sword-devel/2002-September/016194.html
 http://search.netscape.com/ns/search?query=text+encoding+converter&page=3&off...



Note: If you want to add a response to this article, you need to enable cookies in your browser, and then restart your browser.

Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.

 Reply
"Tim", Mon, Apr 24, 2006 16:56 (CDT)

Slight TxtConv limitation

First I want to say what a great tool this is. Extremely useful!

I have an odd situation localizing some existing software, where I need to convert a UTF16 file to a Windows codepage file. The quirk is that I need the output file to still be UTF16, not a byte file. In other words, I need the conversion from unicode to a codepage, but still have the output as a double byte file. I hope this makes sense.

I can accomplish this quite easily with TxtConv by creating a Unicode only mapping pass. But the same mapping file cannot be used by TxtConv to do a normal Unicode-Byte conversion, even though the actual mapping is identical from a "value to value mapping" standpoint. Now this is only a slight annoyance, since I can make a copy of the mapping file and change the Unicode only pass to a Unicode-Byte pass. However, it might be a slight improvement if the -if & -of (or some new options) could override this limitation and just ignore the mapping's unicode/byte enforcement.

 Reply
jonathan, Tue, Apr 25, 2006 04:15 (CDT)

Re: Slight TxtConv limitation

If I understand this, you're wanting to insert a NUL (0) byte after each of the byte values in the "codepage" data, so that they can be read as 16-bit elements?

It seems to me this is a sufficiently unusual (I'm tempted to write "bizarre"!) requirement that it doesn't belong in a general-purpose tool; it's not a standard way to represent the encoded text, but must be specific to a certain piece of software or a particular process.

You can get there in a couple of ways using TECkit: either using a Unicode<>Unicode mapping, where one side isn't really Unicode but your codepage values (as you've tried); or by using a Unicode<>Byte mapping that explicitly inserts the extra 0 bytes next to each code. But it's not a standard encoding form, and so the tool won't automatically transform your codepage values into this form.

 Reply
"Tim", Tue, Apr 25, 2006 13:27 (CDT)

Re: Slight TxtConv limitation

I can understand why you might not want this sort of change in the tool, it's just a suggestion. But TxtConv is really just a mechanism for executing a mapping between integer values. The ability to specify a physical encoding for the input and output data is really a separate concept. Like you said, I can accomplish (and have) my desired results by using a Unicode only pass. I was trying to look at things in a different light, as to why couldn't I use the same mapping file to accomplish a UTF16 to Byte conversion. The mapping is identical, it's just the encoding that differs. TxtConv and the TECkit mapping language are great generic tools, except for this slight limitation.

 Reply
"Bob Eaton", Mon, May 8, 2006 22:40 (CDT) [modified by martinpk on Tue, May 9, 2006 03:05 (CDT)]

Re: Slight TxtConv limitation

One way to do this (if you don't mind working in Microsoft Word) is the following:

1. Install the SILConverters core component package (with which you'll be able to use your TECkit maps to process documents editable in Word; though, you won't need to use TECkit for this particular procedure). Follow the instructions at http://scripts.sil.org/EncCnvtrs for installing it.

2. Be sure to install the Data Conversion Macro when you get the Options Installer dialog.

3. Open one of your Unicode-encoded documents in Word and select some text to convert.

4. Click Tools, Templates and Add-ins... and then the Add button to browse for the Data Conversion Macro dot file (in, C:\Documents and Settings\<username>\Application Data\Microsoft\Templates by default). Click OK to return to Word.

5. Click Tools, Data Conversion to bring up the Data Conversion dialog box.

6. Click the Select button to bring up the Select Converter dialog.

7. Click the Add New button and then double-click the Code Page Converter item.

8. On the Setup tab of the resulting dialog, choose the code page you want and then click the Save in System Repository button to save that converter into the system repository. Then Click OK to return to the Select Converter dialog.

9. With your new converter selected in the upper pane, click the Reverse direction checkbox in the bottom half. Then click OK to return to the Data Conversion dialog in Word.

10. With your text selected (or configured appropriately in the Scope of change area of the Data Conversion dialog box), click OK to convert your Unicode encoded text to the code page encoding.

11. When it's all complete, click File, Save As... with the Save as type set for Plain Text.

12. When the File Conversion dialog box is then displayed, click the Other encoding radio button and then select Unicode from the list.

Once the file is saved as a text file, it will be code page bytes stored in UTF-16 format. Beware, however, that it'll have the UTF-16 BOM at the beginning.

If you want to use TECkit instead, then just change step 7 to add a TECkit map instead.

 Reply
"Bob Eaton", Tue, May 9, 2006 23:14 (CDT)

Re: Slight TxtConv limitation

In fact, it's even easier than that (since you're specifically trying to deal with a code page conversion). With the following procedure, you don't even need SILConverters (i.e. you can do it totally within Word):

1. Open one of your Unicode-encoded documents in Word.

2. In Tools, Options, go to the General tab and make sure the Confirm conversion at Open checkbox is checked.

3. Click File, Save As... with the Save as type set for Plain Text.

4. When the File Conversion dialog box is then displayed, click the Other encoding radio button and then select the entry from the list that corresponds to the code page you want.

5. Re-open the text file in Word and the Convert File dialog box will be displayed. Choose Encoded Text and then the File Conversion dialog box will be displayed again. This time, however, choose the Windows (Default) check box.

6. Now, you can complete steps 11 and 12 from my previous note above to save the document in the same format as above.

 Reply
"Leroy Vargas", Sun, Aug 6, 2006 19:22 (CDT)

SIL software/freeware that includes TECkit package

So far, I know SILConverters 2.2 includes the full TECkit package, and SILConverters 2.2 is itself included in SIL FieldWorks 3.1, according to this Web site.

Which other SIL software/freeware applications also include TECkit (or, better, SILConverters 2.2)?

 Reply
"Bob Eaton", Fri, Feb 23, 2007 00:10 (CST)

Re: SIL software/freeware that includes TECkit package

AdaptIt (as of version 3.1) now supports using a user-configurable SILConverter (TECkit or otherwise) to preprocess the source language word when filling in the target language text box. That is, if the source word isn't already in the AdaptIt knowledge base, then normally, AdaptIt just puts the same word in the target text box. If you configure a SILConverter to be used, it will run the source language word thru the SILConverter before filling the target text box in with the result of the conversion.

This is useful, for example, if there are consistent orthographic changes between the source and target languages (though, this functionality was already available in AdaptIt due to its native support of CC).

I'm also using it to preprocess the source word into a different script using a TECkit-based transliterator. That is, TECkit-based transliterator converts between Devanagari and Urdu (Arabic) Unicode, so the target word appears in Urdu (Arabic) script.

I believe that Lexique Pro has native support for TECkit, and SpeechAnalyzer is due to have support for SILConverters. There is also native support for TECkit in different SIL software developed in Perl (search this site).

 Reply
"Glen Wilson", Thu, Jun 19, 2008 18:16 (CDT)

TECkit install

Hello,

I have been trying to install teckit in Ubuntu Linux 8.04 with no success. I have followed the ./compile , make , make install route to installation with no success. I am always left with the message:

glen@dell-desktop:~/Desktop$ teckit_compile tex-text-long-s.map

teckit_compile: error while loading shared libraries: libTECkit_Compiler.so.0: cannot open shared object file: No such file or directory

I'm sorry, but this is the first time I've compiled a program by source and I don't know what to do from here.

Thanks for your help,

Glen M. W.

 Reply
"purnendu", Sat, Mar 13, 2010 09:25 (CST)

teckit installation

I could not compile TECkit-2.5.1 on opensuse 11.2 gcc 4.4.1. I get the following message

.......................................................

../source/Compiler.cpp: In function 'char* TECkit_GetTECkitName(UInt32)':

../source/Compiler.cpp:212: error: 'sprintf' was not declared in this scope

../source/Compiler.cpp: In function 'const char* asHex(UInt32, short int)':

../source/Compiler.cpp:349: error: 'sprintf' was not declared in this scope

../source/Compiler.cpp: In function 'const char* asDec(UInt32)':

../source/Compiler.cpp:357: error: 'sprintf' was not declared in this scope

make[2]: * [Compiler.lo] Error 1

.......................................................

 Reply
"Nicolas Christener", Sun, Nov 20, 2011 07:42 (CST)

Re: teckit installation

This should fix the mentioned build error:

--- source/Compiler.cpp.orig 2011-11-20 13:32:19.042098126 +0100

+++ source/Compiler.cpp 2011-11-20 13:32:52.921046035 +0100

@@ -28,6 +28,7 @@ Description:

*/

#include "Compiler.h"

+#include "stdio.h"

#include <iostream>

#include <iomanip>

Note: If you want to add a response to this article, you need to enable cookies in your browser, and then restart your browser.



© 2003-2014 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us at .