You are here: Encoding
Short URL: http://scripts.sil.org/Encoding
Encoding
Character set encoding
basics Peter Constable, 2001-06-13; 113138 reads
In understanding technologies for working with multilingual and multi-script text data, we need to start with
an understanding of character encoding. Systems for working with text involve a collection of processes that
work together—processes for creating and editing text, presenting it, for sorting, for laying out paragraphs
and wrapping at line breaks, etc. Character encoding is the thing that ties all of these processes
together.
Computer systems employ a wide variety of character encodings. The most important of these for us is Unicode.
It is also important for us to understand other encodings, however, and how they relate to Unicode. In this
section, I want to look at some basic concepts that relate to all encodings, and also give an overview of
legacy encodings and their importance for us.
XSEM: XML Scripture Encoding
Model Dennis Drescher, 2001-09-06; 26990 reads
The XML Scripture Encoding Model (XSEM), an SIL project, is a markup language that conforms to the Extensible
Markup Language (XML) version 1.0 standard. On this page you will find information about the model and the
project.
Character Encoding Choices in
Paratext 6 Peter Constable, 2003-04-29; 15969 reads
This article discusses options available to users for how their data can be encoded in Paratext 6, and looks
at pros and cons of those options.
Windows and Codepages
Martin Hosken, 1997-12-29; 37703 reads
This document examines how Windows 95 handles multi-lingual computing. It looks at Languages, Codepages,
Locales, Unicode and Fonts with particular reference to their support in Windows 95.
An alternative title for this document might be: “How to add a new script to Windows 95 and fail”.
© 2003-2012 SIL International, all rights
reserved, unless otherwise noted elsewhere on this page.
Provided by SIL's Non-Roman Script Initiative. Contact us at .