Home

Contact Us

General

Initiative B@bel

WSI Guidelines

Encoding

Principles

Unicode

Training

Tutorials

PUA

Conversion

Resources

Utilities

TECkit

Maps

Resources

Input

Principles

Utilities

Tutorials

Resources

Type Design

Principles

Design Tools

Formats

Resources

Font Downloads

Gentium

Doulos

IPA

Rendering

Principles

Technologies

OpenType

Graphite

Resources

Font FAQ

Links

Glossary


NRSI: Computers & Writing Systems

SIL HOME | SUPPORT | DONATE

You are here: Encoding > Conversion > Utilities
Short URL: http://scripts.sil.org/RTF2SFM

RTF2SFM — converting styled Word documents to SFM

Bob Hallissy, 2009-10-16

RTF2SFM converts a styled Word .RTF file to UTF8-encoded SFM. Unlike the old  SF Converter package, RTF2SFM correctly handles Unicode characters.

RTF2SFM is part of the SIL::RTF Perl module that provides an event-driven parser for examining or processing RTF files. The program is supplied either as a Perl module (requiring Perl 5.8 or later) or as a standalone Windows .EXE file.

New version: 1.10 (2009-10-16)

see history for details.

Contents

Synopsis

RTF2SFM [-s] [-q] [-c ControlFile] [-o outFile] [-a annotationFile] inFile

RTF2SFM -p [-o outFile]

RTF2SFM [-h] [-v]

Converts a styled Word .RTF file to UTF8-encoded SFM.

Options

  • -c names an options configuration file
  • -o names an output file (otherwise writes to STDOUT)
  • -a names an output file to hold annotations (comments, revision tracking)
  • -s suppress extra processing needed to convert Insert Symbol
  • -p output the built-in control file to outFile or STDOUT.
  • -q quiet mode (no % completion or 'done' message)
  • -h output extended help message
  • -v output version information.

If -c is not supplied, looks for an RTF2SFM.INI file in the current directory. If RTF2SFM.INI isn't found, uses a standard set of options (based on SFConverter). Note: .INI files are assumed to be UTF-8! The control file has syntax like a Windows .INI file. Use -p option or see Downloads for working examples.

Any residue (e.g., text in a style for which there is no sf tag defined, or unknown RTF destinations) is written to a residue file (named after the output file if -o supplied, else named residue.res).

Usage notes

Residue file

RTF2SFM always generates a residue file that contains information about things in the RTF that it didn't understand. Always review the residue file. Although some of the messages may not make much sense, most messages end with a [chapter:verse] reference (assuming such are present in the input) which will point you to the offending area of your RTF file.

There are two common kinds of residue: unidentified styles and unhandled destinations.

Unidentified styles

Messages such as:

are likely to be important. What it means is that the document contains (near chapter 31 vs 30) text in a style ('Subtitle') that RTF2SFM doesn't know about. If your document is correct, then you need to enhance the control file for RTF2SFM to tell the program what should happen with text in this style.

Unhandled destinations

Another message you may see in the residue looks like:

     end dest: '*xyzzy', '4'  [10: 1]

RTF is, even after all these years, an evolving standard. New features of programs like Microsoft Word often mean new features in the RTF. Sometimes these come as new "destinations" that RTF2SFM doesn't know about, and this causes this type of message.

Most likely these errors can be ignored. But check the verses indicated to see if any important text is missing from the SFM file.

Also, I would appreciate receiving copies of RTF files (and the control file you are using, if any) that generate such messages. I will attempt to eliminate the offending destinations in future releases.

If you can't figure out the residue, you can contact me.

Processing of Insert Symbol fields

If the font used for an inserted symbol has a known mapping to Unicode, then RTF2SFM will convert the symbol according to that mapping. Then, if the resultant Unicode character is one that has the mirrored property and the immediately preceding character was in a right-to-left run, RTF2SFM will surround the mirrored character with U+202D LEFT-TO-RIGHT OVERRIDE and U+202C POP DIRECTIONAL FORMATTING so it will display correctly. Currently the only font with a known mapping is the Symbol font.

If the font is unknown, then the symbol is mapped to the PUA area, specifically to the range U+F000 to U+F0FF.

Known problems

When parsing a file with changes tracked and there is some deleted text, the marker associated with that text may be output (even though there is no data after the marker).

Installation instructions

Please note that RTF2SFM is a command line utility. To use it you need to open up a Command window. You can set up shortcuts to the program if you like, but there is no pretty graphical user interface.

Standalone Windows Executable

If you want to use the standalone Windows executable, simply download it and put it in a folder on your PATH somewhere.

Note

The downloadable EXE files are not installers or setup programs — they are the actual program. Simply put the EXE in a directory (such as Windows) that is named on your PATH variable. To find out what directories are named on your path, start a command window and type PATH <return>. The directories will be delimited by semicolons.

SIL-RTF perl module

If you want to use the Perl source code rather than the stand-alone Windows executable, then you must have Perl 5.8 or later installed. Download the archive and unpack it to a temporary directory. Start a command shell in the SIL-RTF-1.5 subfolder and execute the command sequence:

perl makefile.pl
make
make install

Now you should be able to execute RTF2SFM from any command prompt.

If you don't have a make program, you might see if Microsoft still offers their older  NMAKE15.EXE.

Disclaimers

The RTF2SFM program and the SIL::RTF module on which it is based are unreleased software and carry no warranties of any kind. Use at your own risk.

If you find or fix bugs then the author would appreciate hearing from you. See support for contact information.

Change history

VersionDate postedWhat's new
1.10 2009-10-16

Changes to support converting typeset dictionaries (better handling of empty markers, adding space after SFMs due to character styles)

Added -q option
1.9 2009-10-05

Quiet unhandled destination message re: pntext

Added -p option

Added support for [destinations] in INI
1.8 2009-09-28 Quiet unhandled destination messages re: upr, *ud
1.7 2009-03-27

Major update:

Insert v 1 when it is absent

Changed default control file to match  USFM ver 2.2

Reorders things like s and r to be after c

Extended help

Remove anchor from textbefore handling so it can be more useful — now it can be an expression like [^0-9] to remove anything other than digits in the chapter & verse nums. Since this would break the existing .INI files for footnotes, I've made this behaviour controlled by a new option, textbeforeIsUnanchored, from the [options] section of the .INI file.

Use this capability, added regex to c and v textbefore to strip out all but digits

Removed {} from char style SFMs that have no endmarker (e.g. tr tc)

Quiet unhandled destinations: *defchp *defpap *themedata *colorschememapping *datastore *background

Fix bug that caused "Can't find Unicode property definition "Mirrored"" message.
1.6 2007-09-07

Bug fixes:

Now handles MacRoman character set

Less residue from Word 2003 documents
1.5 2006-02-16

Bug fixes:

Correction to footnote endings when not inline.

Better handling of whitespace around chapter and verse numbers.

Was completely omitting empty markers (e.g. b)
1.4 2005-01-24

Support for UBS usfm (ver 2.0). Control file for usfm available in ControlFileExamples.zip.

Can identify PT6 generated hard formatted superscript footnote callers; can also put the footnote caller literal into the SFM.

Normalizes style names to remove spaces after commas.
1.3 2005-01-18

Allow embedded styles to be defined with an endtag, in which case the end tag is used to delimit text rather than enclosing the text in braces ...}

Warn about inappropriate styles or missing style defs in footnotes

Added [chapter:verse] to RESIDUE output to aid manual review

Remove Old Properties destinations (*oldcprops, *oldpprops, *oldtprops, *oldsprops) from residue

Provide version of EXE that supports CJK character sets
1.1 2004-10-01 Support for a few archaic character sets added
1.0 2004-09-22

Understands Insert Symbol fields, but this can be suppressed (for slight speed improvement) by supplying -s

Removes escapes that preceded some symbols in output

Can supply Paratext footnote caller character via .INI
0.8 2004-08-04

Rewritten to utilize Perl 5.8 Unicode facilities. Will no longer run on pre-5.8 Perl versions. RTF2SFM users should be unaffected, but if you are using the RTF parser for other programs, check documentations for changes

Removed *pgptbl from residue (by skipping)

Detect and warn about missing input file

Detect and warn about paragraph styles used but not in configuration.
0.7 2003-11-03 Paratext-compatible footnotes; fix ZWNJ problem
0.6 2003-10-02 Did some cleanup on the residue file
0.5 2003-07-04 First version posted here

Downloads

Standalone Windows executable of RTF2SFM program (ver 1.10)
Bob Hallissy, 2009-10-16
Download "RTF2SFM.exe", Windows application, 3MB [865 downloads]
Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-10-16
Download "RTF2SFM-full.exe", Windows application, 4MB [602 downloads]
Example RTF2SFM control files, including SFConverter.ini and several USFM versions (2.0 through 2.2)
Bob Hallissy, 2009-03-27
Download "ControlFileExamples.zip", ZIP archive, 24KB [639 downloads]

To obtain the Perl source module, view the public Subversion repository or download the tarball.

Previous versions

Standalone Windows executable of RTF2SFM program (ver 1.9)
Bob Hallissy, 2009-10-05
Download "RTF2SFM.exe", Windows application, 3MB [565 downloads]
Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-10-05
Download "RTF2SFM-full.exe", Windows application, 4MB [554 downloads]
Standalone Windows executable of RTF2SFM program (ver 1.8)
Bob Hallissy, 2009-09-28
Download "RTF2SFM.exe", Windows application, 3MB [529 downloads]
Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-09-28
Download "RTF2SFM-full.exe", Windows application, 4MB [494 downloads]
Standalone Windows executable of RTF2SFM program (ver 1.7 beta)
Bob Hallissy, 2009-03-27
Download "RTF2SFM.exe", Windows application, 3MB [665 downloads]
Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-03-27
Download "RTF2SFM-full.exe", Windows application, 4MB [544 downloads]
Standalone Windows executable of RTF2SFM program (ver 1.6)
Bob Hallissy, 2007-09-07
Download "RTF2SFM.exe", Windows application, 2MB [874 downloads]
Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2007-09-07
Download "RTF2SFM-full.exe", Windows application, 4MB [842 downloads]
Example RTF2SFM control files, including DefaultControlFile.ini (matches RTF2SFM default) and USFM (versions 2.0, 2.05, and 2.1)
Bob Hallissy, 2007-09-07
Download "ControlFileExamples.zip", ZIP archive, 11KB [1471 downloads]
Standalone Windows executable of RTF2SFM program (ver 1.5)
Bob Hallissy, 2006-02-16
Download "RTF2SFM.exe", Windows application, 2MB [1299 downloads]
Standalone Windows executable of RTF2SFM program (ver 1.5) including CJK
Bob Hallissy, 2006-02-16
Download "RTF2SFM-full.exe", Windows application, 3MB [1205 downloads]
Perl-based RTF parser SIL::RTF, including RTF2SFM program (ver 1.4)
Bob Hallissy, 2005-01-24
Download "SIL-RTF-1.4.tar.gz", gzipped tar archive, 32KB [1308 downloads]
Standalone Windows executable of RTF2SFM program (ver 1.4)
Bob Hallissy, 2005-01-24
Download "RTF2SFM.exe", Windows application, 2MB [1086 downloads]
Same as above but includes mappings for Chinese, Japanese and Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2005-01-24
Download "RTF2SFM-full.exe", Windows application, 3MB [1372 downloads]

Support

Randy Hasty has written a tutorial on how to use this tool to convert RTF documents to SFM. Though written in terms of version 0.7, it may still be helpful.

As this program is provide at no cost, I am unable to provide a commercial level of personal technical support. I am interested in hearing from you, however, and will try to resolve problems that are reported to me. You can send feedback to me via a webform here. Alternatively, my email address looks like Вob_Нallissy@ѕіl.org (but cutting & pasting from this window into your emailer won't result in a working address — you will need to type it into your email program.)

Other resources

Microsoft Rich Text Format (RTF) specifications, versions  1.6 for older versions of Word,  1.7 for Word 2002, and  1.8 for Word 2003.


Backlinks (20 most popular; affiliated sites and popular search engines removed)


© 2003-2014 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us here.