You are here: Encoding
> Conversion > Utilities
Short URL: http://scripts.sil.org/RTF2SFM
RTF2SFM — converting styled Word documents to SFM
Bob Hallissy, 2009-10-16; 19442 reads
RTF2SFM converts a styled Word .RTF file to UTF8-encoded SFM.
Unlike the old
SF Converter package, RTF2SFM correctly
handles Unicode characters.
RTF2SFM is part of the SIL::RTF Perl module that provides an event-driven parser for examining or
processing RTF files. The program is supplied either as a Perl module (requiring Perl 5.8 or later) or as a
standalone Windows .EXE file.
New version: 1.10 (2009-10-16)
see history for details.
Contents
Synopsis
Options
Usage notes
Residue file
Unidentified styles
Unhandled destinations
Processing of fields
Known problems
Installation instructions
Standalone Windows Executable
SIL-RTF perl module
Disclaimers
Change history
Downloads
Previous versions
Support
Other resources
Synopsis
RTF2SFM [-s] [-q] [-c ControlFile] [-o outFile] [-a annotationFile] inFile
RTF2SFM -p [-o outFile]
RTF2SFM [-h] [-v]
Converts a styled Word .RTF file to UTF8-encoded SFM.
Options
- -c names an options configuration file
- -o names an output file (otherwise writes to STDOUT)
- -a names an output file to hold annotations (comments, revision
tracking)
- -s suppress extra processing needed to convert Insert Symbol
- -p output the built-in control file to outFile
or STDOUT.
- -q quiet mode (no % completion or 'done' message)
- -h output extended help message
- -v output version information.
If -c is not supplied, looks for an RTF2SFM.INI
file in the current directory. If RTF2SFM.INI isn't found, uses a standard set
of options (based on SFConverter). Note: .INI files are assumed to be UTF-8! The
control file has syntax like a Windows .INI file. Use -p option or see Downloads for working examples.
Any residue (e.g., text in a style for which there is no sf tag defined, or unknown RTF destinations) is
written to a residue file (named after the output file if -o supplied, else
named residue.res).
Usage notes
Residue file
RTF2SFM always generates a residue file that contains information about things in the RTF that it didn't
understand. Always review the residue file. Although some of the messages may not make much sense, most
messages end with a [chapter:verse] reference (assuming such are present in the input) which will point you
to the offending area of your RTF file.
There are two common kinds of residue: unidentified styles and unhandled destinations.
Unidentified styles
Messages such as:
are likely to be important. What it means is that the document contains (near chapter 31 vs 30) text in a
style ('Subtitle') that RTF2SFM doesn't know about. If your document is correct, then you need to enhance the
control file for RTF2SFM to tell the program what should happen with text in this style.
Unhandled destinations
Another message you may see in the residue looks like:
end dest: '*xyzzy', '4' [10: 1]
RTF is, even after all these years, an evolving standard. New features of programs like Microsoft Word
often mean new features in the RTF. Sometimes these come as new "destinations" that RTF2SFM doesn't know
about, and this causes this type of message.
Most likely these errors can be ignored. But check the verses indicated to see if any important text is
missing from the SFM file.
Also, I would appreciate receiving copies of RTF files (and the control file you are using, if any) that
generate such messages. I will attempt to eliminate the offending destinations in future releases.
If you can't figure out the residue, you can contact me.
Processing of fields
If the font used for an inserted symbol has a known mapping to Unicode, then RTF2SFM will convert the
symbol according to that mapping. Then, if the resultant Unicode character is one that has the mirrored
property and the immediately preceding character was in a right-to-left run, RTF2SFM will surround the
mirrored character with U+202D LEFT-TO-RIGHT
OVERRIDE and U+202C POP DIRECTIONAL
FORMATTING so it will display correctly. Currently the only font with a known mapping is the
Symbol font.
If the font is unknown, then the symbol is mapped to the PUA
area, specifically to the range U+F000 to U+F0FF.
Known problems
When parsing a file with changes tracked and there is some deleted text, the marker associated with that
text may be output (even though there is no data after the marker).
Installation instructions
Please note that RTF2SFM is a command line utility. To use it you need to open up a Command window. You
can set up shortcuts to the program if you like, but there is no pretty graphical user interface.
Standalone Windows Executable
If you want to use the standalone Windows executable, simply download it and put it in a folder on your
PATH somewhere.
Note
The downloadable EXE files are not installers or setup programs — they are the
actual program. Simply put the EXE in a directory (such as Windows) that is named on your PATH variable. To
find out what directories are named on your path, start a command window and type PATH <return>. The
directories will be delimited by semicolons.
SIL-RTF perl module
If you want to use the Perl source code rather than the stand-alone Windows executable, then you must have
Perl 5.8 or later installed. Download the archive and unpack it to a temporary directory. Start a command
shell in the SIL-RTF-1.5 subfolder and execute the command sequence:
perl makefile.pl
make
make install
Now you should be able to execute RTF2SFM from any command prompt.
If you don't have a make program, you might see if Microsoft still offers their older
NMAKE15.EXE.
Disclaimers
The RTF2SFM program and the SIL::RTF module on which it is based are unreleased software and carry no
warranties of any kind. Use at your own risk.
If you find or fix bugs then the author would appreciate hearing from you. See support for contact information.
Change history
| 1.10 |
2009-10-16 |
Changes to support converting typeset dictionaries (better handling of empty markers, adding
space after SFMs due to character styles) Added -q option
|
| 1.9 |
2009-10-05 |
Quiet unhandled destination message re: pntext
Added -p option Added support for [destinations] in INI
|
| 1.8 |
2009-09-28 |
Quiet unhandled destination messages re: upr, *ud |
| 1.7 |
2009-03-27 |
Major update:
Insert v 1 when it is absent
Changed default control file to match USFM ver 2.2
Reorders things like s and r to be after c
Extended help
Remove anchor from textbefore handling so it can be more useful — now it can be an expression
like [^0-9] to remove anything other than digits in the chapter & verse nums. Since this would
break the existing .INI files for footnotes, I've made this behaviour controlled by a new option,
textbeforeIsUnanchored, from the [options] section of the .INI file.
Use this capability, added regex to c and v textbefore to strip out all but digits
Removed {} from char style SFMs that have no endmarker (e.g. tr tc)
Quiet unhandled destinations: *defchp *defpap *themedata *colorschememapping *datastore
*background Fix bug that caused "Can't find Unicode property definition "Mirrored"" message.
|
| 1.6 |
2007-09-07 |
Bug fixes:
Now handles MacRoman character set Less residue from Word 2003 documents
|
| 1.5 |
2006-02-16 |
Bug fixes:
Correction to footnote endings when not inline.
Better handling of whitespace around chapter and verse numbers. Was completely omitting empty
markers (e.g. b)
|
| 1.4 |
2005-01-24 |
Support for UBS usfm (ver 2.0). Control file for usfm available in ControlFileExamples.zip.
Can identify PT6 generated hard formatted superscript footnote callers; can also put the
footnote caller literal into the SFM. Normalizes style names to remove spaces after commas.
|
| 1.3 |
2005-01-18 |
Allow embedded styles to be defined with an endtag, in which case the end tag is used to delimit
text rather than enclosing the text in braces ...}
Warn about inappropriate styles or missing style defs in footnotes
Added [chapter:verse] to RESIDUE output to aid manual review
Remove Old Properties destinations (*oldcprops, *oldpprops, *oldtprops, *oldsprops) from
residue Provide version of EXE that supports CJK character sets
|
| 1.1 |
2004-10-01 |
Support for a few archaic character sets added |
| 1.0 |
2004-09-22 |
Understands fields, but this can be suppressed
(for slight speed improvement) by supplying -s
Removes escapes that preceded some symbols in output Can supply Paratext footnote caller
character via .INI
|
| 0.8 |
2004-08-04 |
Rewritten to utilize Perl 5.8 Unicode facilities. Will no longer run on pre-5.8 Perl versions.
RTF2SFM users should be unaffected, but if you are using the RTF parser for other programs, check
documentations for changes
Removed *pgptbl from residue (by skipping)
Detect and warn about missing input file Detect and warn about paragraph styles used but not
in configuration.
|
| 0.7 |
2003-11-03 |
Paratext-compatible footnotes; fix ZWNJ problem |
| 0.6 |
2003-10-02 |
Did some cleanup on the residue file |
| 0.5 |
2003-07-04 |
First version posted here |
Downloads
 |
Standalone Windows executable of RTF2SFM program (ver
1.10)
Bob Hallissy, 2009-10-16
Download
"RTF2SFM.exe", Windows application, 3MB [656 downloads] |
 |
Same as above but includes mappings for Chinese, Japanese and
Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-10-16
Download
"RTF2SFM-full.exe", Windows application, 4MB [467 downloads] |
 |
Example RTF2SFM control files, including SFConverter.ini and
several USFM versions (2.0 through 2.2)
Bob Hallissy, 2009-03-27
Download "ControlFileExamples.zip", ZIP archive, 24KB [520 downloads] |
To obtain the Perl source module, view the public Subversion repository or download the
tarball.
Previous versions
 |
Standalone Windows executable of RTF2SFM program (ver
1.9)
Bob Hallissy, 2009-10-05
Download
"RTF2SFM.exe", Windows application, 3MB [428 downloads] |
 |
Same as above but includes mappings for Chinese, Japanese and
Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-10-05
Download
"RTF2SFM-full.exe", Windows application, 4MB [421 downloads] |
 |
Standalone Windows executable of RTF2SFM program (ver
1.8)
Bob Hallissy, 2009-09-28
Download
"RTF2SFM.exe", Windows application, 3MB [393 downloads] |
 |
Same as above but includes mappings for Chinese, Japanese and
Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-09-28
Download
"RTF2SFM-full.exe", Windows application, 4MB [380 downloads] |
 |
Standalone Windows executable of RTF2SFM program (ver 1.7
beta)
Bob Hallissy, 2009-03-27
Download
"RTF2SFM.exe", Windows application, 3MB [531 downloads] |
 |
Same as above but includes mappings for Chinese, Japanese and
Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2009-03-27
Download
"RTF2SFM-full.exe", Windows application, 4MB [445 downloads] |
 |
Standalone Windows executable of RTF2SFM program (ver
1.6)
Bob Hallissy, 2007-09-07
Download
"RTF2SFM.exe", Windows application, 2MB [761 downloads] |
 |
Same as above but includes mappings for Chinese, Japanese and
Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2007-09-07
Download
"RTF2SFM-full.exe", Windows application, 4MB [754 downloads] |
 |
Example RTF2SFM control files, including DefaultControlFile.ini
(matches RTF2SFM default) and USFM (versions 2.0, 2.05, and 2.1)
Bob Hallissy, 2007-09-07
Download "ControlFileExamples.zip", ZIP archive, 11KB [1353 downloads] |
 |
Standalone Windows executable of RTF2SFM program (ver
1.5)
Bob Hallissy, 2006-02-16
Download
"RTF2SFM.exe", Windows application, 2MB [1174 downloads] |
 |
Standalone Windows executable of RTF2SFM program (ver 1.5)
including CJK
Bob Hallissy, 2006-02-16
Download
"RTF2SFM-full.exe", Windows application, 3MB [1083 downloads] |
 |
Perl-based RTF parser SIL::RTF, including RTF2SFM program (ver
1.4)
Bob Hallissy, 2005-01-24
Download
"SIL-RTF-1.4.tar.gz", gzipped tar archive, 32KB [1205 downloads] |
 |
Standalone Windows executable of RTF2SFM program (ver
1.4)
Bob Hallissy, 2005-01-24
Download
"RTF2SFM.exe", Windows application, 2MB [962 downloads] |
 |
Same as above but includes mappings for Chinese, Japanese and
Korean character sets and is, as a result, a larger download.
Bob Hallissy, 2005-01-24
Download
"RTF2SFM-full.exe", Windows application, 3MB [1253 downloads] |
Support
Randy Hasty has written a tutorial on how to use this tool
to convert RTF documents to SFM. Though written in terms of version 0.7, it may still be helpful.
As this program is provide at no cost, I am unable to provide a commercial level of personal technical
support. I am interested in hearing from you, however, and will try to resolve problems that are reported to
me. You can send feedback to me via a webform here. Alternatively,
my email address looks like Вob_Нallissy@ѕіl.org (but cutting & pasting from this window into your
emailer won't result in a working address — you will need to type it into your email program.)
Other resources
Microsoft Rich Text Format (RTF) specifications, versions
1.6 for older versions of Word,
1.7 for Word 2002, and
1.8 for Word 2003.
Backlinks (20 most popular; affiliated sites and popular search engines removed)
© 2003-2013 SIL International, all rights
reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us at .