|
|
NRSI: Computers & Writing Systems You are here: Encoding > Unicode How do I encode...?
Glottals GlottalsQuestion: What character should I use to represent the glottal stop? Answer: There are a lot of different things that people have done in the past. If you want something that looks like a curly quote you should use U+02BC Many orthographies have used something that looks like the straight quote. There were so many problems with using U+0027 U+02BE Some Saskatchewan orthographies use an upper and lowercase glottal stop. Those are U+0241 Of course, the IPA representation is U+0294 DiacriticsQuestion: I want to put a diacritic on a “dotted i” and want to retain the dot on the “i”. Can you add that feature to your fonts? Answer: The Unicode Standard addresses this in chapter 7 (see
Question: I need a “V”, “t”, “n” and “l” with a macron under each. Unicode does not have these characters. Can you add these to your PUA and get them into Unicode for me, or is there another way I can encode this character? Answer: Unicode does have some precomposed characters because they already existed in standards. The Unicode Technical Committee will no longer accept precomposed forms unless there is a very convincing argument. However, each of these can be encoded in Unicode. So, for example “V” with a macron under it should be encoded as two characters (U+0056 LATIN CAPITAL LETTER V + U+0331 COMBINING MACRON BELOW): The same thing can be done with each of your other characters, and, in fact, any other base + diacritic. Question: You have left out one crucial Unicode range of four diacritics which are used within the Latin-script in the library world: U+FE20..U+FE23.
Transliterated Cyrillic records e.g. make heavy use of the first two. Answer: Originally we made a deliberate decision not to include the combining half marks in our fonts. We consider U+0360 COMBINING DOUBLE TILDE and U+0361 COMBINING DOUBLE INVERTED BREVE to be the preferred characters to use. Thus, to put the U+0361 COMBINING DOUBLE INVERTED BREVE over an “ia”, the preferred encoding would be to put the U+0361 between “ia” (i + U+0361 + a):
However, we were convinced that the library world does need this range and so they were added to our Unicode Roman fonts (Doulos SIL ver 4.1 and Charis SIL ver 4.1). Positioning of these may not be perfect.
Question: I need a diacritic on an “i”. Should I use the dotless “i” that I found in Unicode or what should I do? I also need to have a diacritic that will go on the upper case “i” and I can't find different heights for the diacritics. Answer: This is where Unicode is really, really useful. You no longer need to encode two different versions of an “i” and two different versions of a diacritic. In fact, you should not! If you look at the character properties for the character you have suggested (U+0131 So, you should just use the base character plus the diacritic. (This makes data analysis much simpler as well.) Unicode, along with smart fonts, will automatically handle the dot removal for the “i” and height adjustment for the upper case “i”. For example, i with acute would be encoded as i + U+0307 + U+0301.
In the following example you can see that the diacritic is shifted down if you have characters that have descenders:
OverlaysQuestion: I need to use a slash “L” ( Answer: Sometimes people get confused about whether to use precomposed or decomposed characters that are in Unicode. A simple rule-of-thumb to go by is that if a character has diacritics (either above or below the character), it can be decomposed. If the character has an “overlay” (superimposed on the character) then the preformed (not precomposed) character should be used. An easy way to find Unicode characters is to look at: Unicode 5.1 Latin and Cyrillic characters – sorted. This document is sorted alphabetically. However, it does not show character properties and decompositions, so if you find you need that information you will need to go to the Unicode book or the In the example we are using (U+0141 LATIN CAPITAL LETTER L WITH STROKE Question: I cannot find a barred Answer: Although what you are requesting looks different, fundamentally this is the same character as ToneQuestion: I see that Unicode (and your Doulos SIL font) has individual tone letters (U+02E5..U+02E9), but does not have the tone glides. Can you get those encoded in Unicode? They are very important in linguistic work. Answer: Unicode can already handle these. You do need a smart font (like Doulos SIL) to make it work. You should type the tone letters in the correct linguistic order and they should become the correct tone glide. For example:
Question: How do I know which version of the schwa to use? There is U+0259 Answer: This one will rise up and bite you if you are not careful! This is where looking at the documentation is important. If you look at U+0259 you will see:
There are a number of useful bits of information here. Firstly, you see that it tells you U+018E Another interesting test is to type both of the schwas into a word processor (like Word). Select them both and click on . You should see two different forms of the upper case schwa. This shows you how important it is to match the lower case character (which looks exactly the same) with the correct upper case character (which looks significantly different).
In this example you want to make sure that if you are using U+018F Question: I've noticed that when I'm looking for phonetic characters, not everything I want is in the IPA extensions. For example, the beta which is used for a voiced bilabial fricative is, I believe, supposed to be encoded as U+03B2 Answer: You are right about the voiced bilabial fricative being encoded as U+03B2 The Question: I want an open o with the serif at the top. I see that Unicode now has U+2183 Answer: U+2183 Question: I want a handwritten style a. Unicode has U+0251 Answer: U+0251 Page History2009-02-25 LP: added glottal question © 2003-2009 SIL International, all rights reserved, unless otherwise noted elsewhere on this page. |