This is an archive of the original site, preserved as a historical reference. Some of the content is outdated. Please consult our other sites for more current information:, ScriptSource, FDBP, and silfontdev


Contact Us


Initiative B@bel

WSI Guidelines


















Type Design


Design Tools



Font Downloads










Font FAQ



Computers & Writing Systems


You are here: Encoding > Conversion > Utilities
Short URL:

Converting RTF to SFM using RTF2SFM perl script

Randy Hasty, 2004-04-15


This tutorial is based on Bob Hallissy's RTF2SFM program. RTF2SFM is a useful tool for converting a styled Word .RTF file to UTF8-encoded SFM (Standard Format Marker). Unlike SFConv it correctly handles Unicode characters.

Please Read this first!

This tutorial was written for an older version of RTF2SFM which required Perl5.6. For Windows users, the latest version is available as a standalone Windows executable and no longer requires Perl to be installed first.

Additionally, there is now more extended help (see -h option) that means you don't have to look into the Perl source to understand the control file options.

For those not on Windows or who, for other reasons, want the Perl source, it now requires Perl 5.8 or newer.

Download and Install Perl 5.6.1

Download Perl

Install Perl 5.6.1 as this is the version required for the RTF2SFM conversion program to work.


If you already had a “500” version installed (this one is a “600”), installing this will fail. You will need to uninstall the prior version and do some registry maintenance to get this to work. This is all documented in the html Release Notes.


The download file is rather large at over 8 MB. For downloading and installing Perl you will need at least 50 MB free on your hard drive.

  1. Connect to the  ActiveState web site.
  2. On the Downloads screen look for ActivePerl 5.6.1 build nnn and click  MSI  next to the Windows entry

This downloads the file ActivePerl-5.6.1.nnn-MSWin32-x86.msi which you can install by double clicking from Windows Explorer.


Perl 5.6.1 can also be found on the “CTC02 Resource Collection” CD.

Install Perl

Install Perl to c:Perl.

  1. To begin the install, from Windows Explorer, navigate to the folder where you downloaded Perl and double click the file ActivePerl-5.6.1.nnn-MSWin32-x86.msi.
  2. On Welcome screen click  Next ,
  3. On License Agreement screen select I accept the terms of the license agreement and click  Next  .
  4. On Custom Setup screen make sure the location points to c:Perl.
    1. If not, click  Browse 
    2. On Change current destination folder enter c:Perl as folder name
    3. Click  OK .
  5. Click  Next .
  6. On New features in PPM click  Next  (leave the checkbox unchecked).
  7. On Choose Setup Optionscheck: Add Perl to the PATH environment Create Perl file extension association
  8. Click  Next .
  9. On Ready to install click  Install .
  10. When install is complete click  Finish .

Verify Perl Installed

  1. At the Command Prompt type path and press  Enter . C:perlbin should be listed as the first directory in the path.
  2. At the Command Prompt type in Perl –v and press  Enter . The first line of the message should show this is v5.6.1.

Install RTF2SFM


Download the RTF2SFM ZIP archive for the program (current version 0.7) to your hard drive and unzip it to your directory of choice such as:


Follow the website’s installation instructions as in the example below.

This should create RTF2SFM.PLX and RTF2SFM.BAT in c:Perlbin so that when you type RTF2SFM at the Command Prompt it actually executes.


At the Command Prompt type in RTF2SFM and press  Enter . You should get the following message:

Converting RTF to USFM

Download and unzip sample files

41MAT.RTF, RTF2SFM.ini and AreRTFStylesDefinedInRTF2SFMConversionFile macro
Lorna A. Priest, 2004-06-25
Download "", ZIP archive, 63KB [2906 downloads]
  1. Download the above file
  2. Unzip the data in the file into a suitable working directory such as: c:JobsParatextRTF2SFM
  3. Skip to “Running the Conversion” if you just want to use the the sample .ini file which is included with the If you want to learn how to do it all yourself, continue on.

Get list of styles used in the RTF file to convert to USFMs

Open the file to be converted into Word. 41MAT.RTF is provided for you as an example. Confirm which styles are used. You can do this in Word 2000 (9.0) by selecting Format, Style…, under List text box styles in use. Creating macros similar to what follows is better as it generates a list of used styles that can be copied later. These macros search to see which styles available to this document are actually used in the document. It also checks it against the styles found in RTF2SFM.plx to see if you need to do anything else.

Sub AreRTFStylesDefinedInRTF2SFMConversionFile()
' AreRTFStylesDefinedInRTF2SFMConversionFile Macro
' Description:  Creates a list of styles found in an open RTF file.
'               Reports which listed style(s) is NOT found in a
'               designated RTF2SFM conversion file,
' Prerequisite: 1. The RTF file must be open in the Active Window and
'                  should already be saved.
'               2. Word thinks the RTF file has been modified, so make
'                  sure you DON'T SAVE the RTF file, just to be safe.
End Sub

Sub ListStylesInDoc()
' ListStylesInDoc Macro
' This macro will create a list of the styles used in a Word Document,
' It will display a maximum of 500 styles. It was created to analyze a
' scripture word document/rtf file. It will run on the Active Word
' Window and:
' 1. make sure window is set to "show all" so hidden styles will be processed
' 2. look through list of styles in the document
' 3. find which styles actually exist in the document
' 4. create a new document containing the list of styles used
' 5. restore "show all" setting for the original document
'    and NOT show all for the newly created List of Styles doc

   Dim ActiveDoc As String
   Dim CurrStyle As String
   Dim FoundStyle As Boolean
   Dim IgnoreStyleCnt As Integer
   Dim J As Integer
   Dim Msg As String
   Dim NameOfDoc As String
   Dim StyleCount As Integer
   Dim StyleList(500) As String
   Dim WasShowAll As Boolean
   Dim X As Long
   Msg = "This macro uses Find to locate styles used in this    " + vbCr + _
         "RTF file, specifically Find / Format / Style. Using " + vbCr + _
         "this feature somehow makes Word think that a    " + vbCr + _
         "change has occurred. Then, when you go to close   " + vbCr + _
         "the document, Word asks if you want to save the    " + vbCr + _
         "changes to the file." + vbCr + _
         vbCr + _
         "While the macro has made no actual changes to " + vbCr + _
         "the data, it is still safest NOT TO SAVE the RTF" + vbCr + _
         "file when closing it."
   MsgBox Msg, vbInformation, "WARNING"
   ' 1. make sure window is set to "show all" so hidden styles will be processed
   NameOfDoc = ActiveWindow.ActivePane.Document.Name
   If ActiveWindow.ActivePane.View.ShowAll Then  ' save whether the document was set to Show All
     WasShowAll = True
     WasShowAll = False
     ActiveWindow.ActivePane.View.ShowAll = True
   End If
   ' 2. look through list of styles in the document
   StyleCount = 0
   X = ActiveDocument.Styles.Count       ' count number of styles available to this document
   For J = 1 To X                        ' for each available style: search to see if it is in this doc
      CurrStyle = ActiveDocument.Styles.Item(J).NameLocal    ' capture style name
   ' 3. find which styles actually exist in the document
      With Selection.Find
          .ClearFormatting               ' remove any previous formatting from match
          .Text = ""                     ' match any text with the Current Style
          .Replacement.Text = ""         ' no replace will take place
          .Forward = True                ' search from current location forward
          .Wrap = wdFindContinue         '   wrapping past the end to the start of the document
          .MatchCase = False
          .MatchWholeWord = False
          .MatchWildcards = False
          .MatchSoundsLike = False
          .MatchAllWordForms = False
          .Format = False                ' special format options will be part of find
          .Style = CurrStyle             ' look for CurrStyle style name
      End With
      If Selection.Find.Execute Then     ' If CurrStyle found
        Select Case CurrStyle
          Case "Default Paragraph Font"  ' skip these Word ONLY Fonts
          Case "No List"                 ' this font is a category of fonts in Word 2003
          Case Else
            StyleCount = StyleCount + 1      '    add it to a list of found styles
            StyleList(StyleCount) = CurrStyle
        End Select
      End If
   Next J
   ' 4 create a new document containing the list of styles used
   For J = 1 To StyleCount
     Selection.TypeText Text:=StyleList(J)
   Next J

   ' 5. restore "show all" setting for the original document
   '    and NOT show all for the newly created List of Styles doc
   ActiveWindow.ActivePane.View.ShowAll = False
   If WasShowAll Then
     Windows(NameOfDoc).ActivePane.View.ShowAll = False
   End If
   StatusBar = StyleCount & " style(s) used in the document"

End Sub

Sub VerifyStylesInConversion()
' VerifyStylesInConversion Macro
' Description:  Checks if styles in a textual List of Styles doc are
'               defined in the conversion file used by the RTF2SFM
'               Perl script. This macro will:
' 1. Count the styles listed in the List of styles doc and move
'    cursor to first entry in the list
' 2. Locate the RTF2SFM configuration file to open
' 3. See if the opened file appears to be an RTF2SFM configuration file
' 4. Identify styles in the List of Styles document that are NOT defined
'    in the specified RTF2SFM conversion/configuration file.
' 5. When stylename not found update entry in List of Styles doc to say
'    NOT FOUND and format entry in BOLD RED
' 6. End of Process wrap up
'               .
' Prerequisite: 1. ListStylesInDoc which creates a list of styles used in
'                  a selected document. That document is the input to this
'               2. Either the RTF2SFM.PLX Perl script with self contained
'                  conversion values for Word styles to SFMs
'                            OR
'                  a hand made RTF2SFM.INI file where you specify the
'                  Word Styles and what SFMs to convert them to.

   Dim StyleCount As Integer
   Dim J As Integer
   Dim LSID As Integer
   Dim Msg As String
   Dim R2S As Integer
   Dim R2S_File As String
   Dim R2S_LookFor As String
   Dim StyleName As String
   Dim X As Long
   R2S_LookFor = "rtf2sfm.plx"  ' modify to request a specific starter filename
   ' 1. Count the styles listed in the List of styles doc and move
   '    cursor to first entry in the list
   LSID = ActiveWindow.Index
   X = ActiveDocument.Paragraphs.Count
   Selection.HomeKey Unit:=wdStory

   ' 2. Locate the RTF2SFM configuration file to open
   Msg = "In the OPEN dialog that follows locate" + vbCr + _
         "the RTF2SFM conversion file to use." + vbCr + vbCr + _
         "This is either the .INI file you created OR" + vbCr + _
         "is the default conversion contained in the " + vbCr + _
         "RTF2SFM.PLX Perl Script installed in the BIN" + vbCr + _
         "subdirectory where Perl was installed."
   MsgBox Msg, vbOKOnly + vbInformation, "RTF2SFM Conversion File"

   With Dialogs(wdDialogFileOpen)
      .Name = R2S_LookFor
      If .Display = 0 Then
         MsgBox "Open Canceled—Macro Terminated", vbExclamation
      End If     ' If no name is entered in the File Name text box,
                 ' Word will open the file with gray background in
                 ' the browser window.
      .Execute   ' opens the configuration file
      R2S_File = ActiveDocument.Path & Application.PathSeparator & ActiveDocument.Name
   End With      ' This is the full path of the file that was opened
   R2S = ActiveWindow.Index
   ' 3. See if the opened file appears to be an RTF2SFM configuration file
   With Selection.Find
       .Forward = True
       .Wrap = wdFindContinue
       .Text = "[styles]"            ' a key element of configuration file
   End With
   If Not Selection.Find.Execute Then
     Msg = R2S_File + vbCr + vbCr + _
           "MISSING [styles] entry." + vbCr + vbCr + _
           "Does not appear to be an RTF2SFM configuration file!" + vbCr + vbCr + _
           "Close file and Rerun." + vbCr + _
           "Select a valid RTF2SFM configuration file."
     MsgBox Msg, vbCritical
   End If

   ' 4. Identify styles in the List of Styles document that are NOT defined
   '    in the specified RTF2SFM conversion/configuration file.
   ' cursor position starts with the first style in the list
   For J = 1 To X - 1
      ' if not the first time through move to next style in list
      If (J <> 1) Then
         Selection.MoveDown Unit:=wdLine, Count:=1
      End If
      ' capture the listed stylename and switch to the conversion file
      StyleName = Selection.Paragraphs(1).Range.Text
      StyleName = Left(StyleName, Len(StyleName) - 1) + "="

      ' see if the stylename followed by an = sign exists in the conversion file
      With Selection.Find
          .Forward = True
          .Wrap = wdFindContinue
          .Text = StyleName
      End With
   ' 5. When stylename not found update entry in List of Styles doc to say
   '    NOT FOUND and format entry in BOLD RED
      If Not Selection.Find.Execute Then
         StyleCount = StyleCount + 1         ' count number of NOT FOUND styles
         Windows(LSID).Activate              ' switch to List of Styles and
         Selection.EndKey Unit:=wdLine       ' modify the entry
         Selection.TypeText Text:=vbTab & "NOT FOUND"
         Selection.HomeKey Unit:=wdLine, Extend:=wdExtend
         With Selection.Font
             .Bold = True
             .ColorIndex = wdRed
         End With
      End If
   Next J
   ' 6. End of Process wrap up
   Windows(R2S).Close SaveChanges:=wdDoNotSaveChanges ' close config file
   Windows(LSID).Activate                             ' switch to List of Styles
   ' if no styles found maybe this is not a valid RTF2SFM configuration file
   If StyleCount = X - 1 Then
     Msg = "NONE of the styles found in configuration file" + _
           vbCr + vbCr + R2S_File + vbCr + vbCr + _
           "Is this an RTF2SFM configuration file?"
     MsgBox Msg, vbCritical
   ' otherwise indicate how many styles weren't defined in the config file
     Msg = StyleCount & " style(s) NOT FOUND in configuration file" + _
           vbCr + vbCr + R2S_File
     MsgBox Msg, vbInformation
   End If

End Sub

Creating the AreRTFStylesDefinedInRTF2SFMConversionFile, ListStylesInDoc, and VerifyStylesInConversion macros

To create these macros you will need to open RTF2SFM_ChkStylesMacro.txt (part of the package you just downloaded) in Word and open the Visual Basic Editor. When done, the macro will be stored in Word’s Normal document template so it can be executed from within any Word document/RTF file.

  1. Open RTF2SFM_ChkStylesMacro.txt in Word.
  2. Copy the macro code from this document to the clipboard by pressing  Ctrl  +  C .
  3. Open the Visual Basic Editor by selecting Tools / Macro / Visual Basic Editor.
  4. In the Projects window select Normal / Modules / NewMacros (this might be slightly different on your machine).
  5. Place the cursor at the top of the window containing macro code.
  6. Paste the copied macro code into that window by pressing  Ctrl  +  V .
  7. Save by pressing  Ctrl  +  S . These macros will be saved as AreRTFStylesDefinedInRTF2SFMConversionFile, ListStylesInDoc and VerifyStylesInConversion.
  8. Close the Visual Basic Editor


This macro sets the window to Show All format codes so that it can process hidden text. If it was set to Hide before the macro executed, it will be restored to hide when it is finished.

Getting the List of Styles found in the RTF document

Open the RTF file to be converted in Word and run the AreRTFStylesDefinedInRTF2SFMConversionFile macro (This macro also runs the other two macros you created):

  1. Select Tools / Macro / Macros…
  2. Double-click the AreRTFStylesDefinedInRTF2SFMConversionFile macro in the window of listed macros.

This will first run the ListStylesInDoc macro and then the VerifyStylesInConversion macro.

You will also get a dialog box saying:

Read this and then you may click  OK .

Next you will also get another dialog box saying:

Read this and then you may click  OK .

As the previous dialog box mentioned, this next dialog box is asking for the location of rtf2sfm.plx which you will find in C:Perlbin if you installed Perl as per the instructions. Find it and click  Open .

Next you will get another dialog box saying:

Read this. If it found any styles which are not listed in the configuration file it will say. You may click  OK .

Now you can look at the temporary word document that was produced with contains a list of style names found in the RTF similar to the following (which was run on the sample file 41MAT.RTF}):

Chapter Number
Footnote Reference
Main Title
Poetry Left
Quote / Poetry
Quote 2
Section Head
Verse Num

If there were any styles not listed in the configuration file it will list them here. Make a note of those.

You may close the sample .rtf file, but remember not to save it.

Build the configuration file to convert the RTF to USFM

An .ini configuration is used to map the word style names to USFM codes. A sample of the code can be found in RTF2SFM.PLX. [As per the Options section of Bob Hallissy’s RTF2SFM perl script description on the SIL scripts web site mentioned at the beginning of this paper.]

You can create an .ini file from the RTF2SFM.PLX as follows:

  1. Copy RTF2SFM.PLX located in c:perlbin to RTF2SFM.INI in the directory where the RTF to be converted is found. (You may need to change the “read-only” attribute.) The .ini file can have any filename.
  2. Edit RTF2SFM.INI.
  3. Delete all the lines from the start of the file through __DATA__ (This gives you a jump start on the conversion) to build the configuration RTF2SFM.ini file.
  4. See if the stylenames found in the RTF file are already in the configuration file. Add any new style names that need to be converted to USFM.
  5. Specify the USFM, marker type (v, c, e, f, I, r as in RTF2SFM.PLX Tag Replacement Information described later in this paper), text to place before the contents (used with c sometimes), text after (not used), and an ending USFM (e.g. f* for footnote text).

You associate the style name in the .ini file to a USFM marker by simply adding an equal sign followed by the USFM. See the sample Configuration File that follows.


It is OK to leave all the unused styles in the .ini file.

Sample RTF2SFM.INI configuration file

; Note: The following data is assumed to be in UTF8!
; stylename = <tag>,<type>,<marker>,<textbefore>,<textafter>,<endtag>
; see %sf for details   note: <textafter> is ignored.
; following styles were in the sample book of Matthew but not in the RTF2SFM Sample
;Default Paragraph Font
SecChp Head=s
SecRef Head=s
SecRefChp Head=s
; the following were part of the sample .ini under __DATA__ at the end RTF2SFM.PLX 
Footnote reference=
Main Title=mt
Secondary Title=st
Chapter Number=c,c
Section Head=s
Paragraph Cont=m
Verse Num=v,v
Quote / Poetry=q
Quote 2=q2
Quote 3=q3
Poetry Left=qm
Footnote Text=f,f,*f*,s+,,f*
Emphasized Word(s)=|emph,e

Running the Conversion

You will convert the RTF file(s) to USFM file(s) by running the conversion program from the Command Prompt.

  1. Open a Command Prompt window.
  2. At the Command Prompt change the directory to the location of the .RTF and .INI conversion files.
  3. Run the conversion program as follows: rtf2sfm –c rtf2sfm.ini –o 41mat.sfm 41mat.rtf


You may have to wait 10 or 15 seconds before you see something happening.

Check residue

When converting an RTF file to SFM, the program may find information that it doesn't know what to do with. Sometimes this information is superfluous, e.g., when Microsoft adds extra information to the RTF file structure that isn't of interest. But the extra information may be from a style that wasn't identified in the configuration file. After converting an RTF file, you should always check the residue file (same name as the output SFM file except with extension .RES) — if you see data in there that is part of your project and you wanted it in the SFM file, you may need to modify your configuration file and try again.

One of the messages you will see in the test file is:

UNHANDLED dest: '*pn', '', '4' 
     end dest: '*pn', '4' 

Just ignore these messages for now.

Annotations (comments and revision tracking)

When editing a document that will eventually be converted to SFM it may be helpful in some project situations to use Word's comments capability (Insert / Comment) or to turn on change tracking (Tools / Track Changes). RTF2SFM will extract such annotations (into a separate file) if you supply a file name by using the -a parameter.

RTF2SFM.PLX Tag Replacement Information

The PLX file header that follows gives some clues to the content of the replacements for the tag data on the right of the equal signs. Pay attention to the bolded text.

my %sf;
#  $sf{<stylename>}{tag}  is the standard format marker, e.g., c
#  $sf{<stylename>}{type} is one of:
#    'v' = verse number character style
#    'c' = chapter number style        <- Added in ver 0.4
#    'e' = embedded character style
#    'f' = footnote (or similar embedded destination)
#     'i' = anything in this style should be ignored
#    'r' = residue: style was used in doc but didn't have an associated tag.

#  $sf{<stylename>}{marker} is used if {type} eq 'f', and indicates the 
#     placeholder mark to be in the text, e.g, '|fn'  or perhaps '*f*'
#  $sf{<stylename>}{textbefore} any initial text that should be stripped  # Works 
#     on paragraph styles only
#  $sf{<stylename>}{textafter} any final text that should be stripped   # Not yet 
#     implemented  ### TODO
# --------------------
# Bar code control:
my $useBarCodes;
# Whether to look for special SFConvert character styles Bar-i, Bar-b, and Bar-u 
# and # map them to |i, |b, and |u markers. Possible values:
#    0 or undef:  do not map
#    1 do map, using |i...|r style
#    2 do map, using |i{...} style

# --------------------
# Footnote control:
my $addFootNoteClosing;
# Whether to mark the end of footnotes using a capitalized marker, e.g., F or
#   Possible values = 1 (true) or 0 (false)
my $inlineFootnotes;
# Whether footnotes are inline a la Paratext or a marker left in the text stream 
# and footnotes output later. Possible values
#    0 Leave marker in the stream and output footnotes later
#    1 Footnotes are inline

# --------------------
# Special destinations
# RTF destinations whose content we are not interested in:
#   %skipDest are things the parser doesn't need, so we skip it by forcing parser 
#     to search for matching brace
#   %ignoreDest are things the parser needs, but we don't
my %skipDest = ( map { $_ => 1 } (qw(info *panose colortbl *pnseclvl
    *falt *ts
    *rsidtbl *generator
    header headerl headerr headerf footer footerl footerr footerf *ftnsep *ftnsepc *aftnsep *aftnsepc 
    *template *bkmkstart *bkmkend 
    *listtable *listoverridetable
    *revtbl *atnid *atnauthor *latentstyles)) );
my %ignoreDest = ( map { $_ => 1 } (qw(fonttbl stylesheet *cs)) ) ;
# --------------------
# Parser object
my $p;        # as passed in as first param to handler routines


You may have to do some cleanup if RTF2SFM includes some data not related to stylesheets. This was necessary after several trial and error passes on scripture RTF code that came from the FolioViews info base. It included some superfluous reference data, but eventually it was all discovered and a CC table was written able to clean most of it out of the data.

All this to say that RTF2SFM is a tool that can do a pretty good job of converting RTF code to SFM format when the RTF file is not generated by SF Converter or by the PNG Scripture template with embedded SFMs.


This conversion is to a UTF-8 version of Unicode. All keyboard characters (access codes under 128) are already UTF-8 compliant. Upper ASCII characters like acute-a “á” (access code 225) will be converted to UTF-8 format, which will be multiple bytes. To get converted SFM files into Paratext you must specify 65001 UNICODE (UTF-8) as the encoding for the project and then import the SFM files.

File List

 ActivePerl-5.6.1.nnn-MSWin32-x86.msi Perl 5.6.1 software [8.6MB file] RTF to SIL conversion software
RTF2SFM.PLX Conversion Perl Script with sample .INI code  
41MAT.RTF Sample RTF file to convert
ListStylesInDoc.txt ListStylesInDoc macro code

© 2003-2024 SIL International, all rights reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Read our Privacy Policy. Contact us here.