Test charts for tonal pinyin in Unicode Web pages
various coding methods for tonal pinyin in Unicode Web pages
Putting Pinyin with tone marks on Web pages can be -- or at least should be able to be -- achieved through several approaches. But at the present level of support by Web browsers, fonts, and operating systems, some methods are more reliable than others. This page presents the different methods and briefly discusses their strengths and weaknesses.
If you just want to convert some text that has tone numbers and don't care about the technicalities of how it gets done, you could skip this page and go directly to the tone numbers to tone marks converter. Alternately, if you have text with tone marks and want to change this into something usable on Web pages, see convert special characters to Web-friendly Unicode numbers. (I'll put an improved converter up soon.)
For a summary of the information here, see my guidelines for using tonal pinyin on Web pages.
Remember: Even if everything in the charts below looks correct on your system, that doesn't mean it will also look correct on someone else's system, which might use a different operating system, different-language operating system, or other browser, and have different fonts installed. I'd appreciate hearing from visitors to this site -- especially those of you with Macs and Linux boxes, and those who use little-known browsers -- what does (and, perhaps more importantly, doesn't) work for you.
the "special" characters needed for tonal pinyin
ā á ǎ à ē é ě è ī í ǐ ì ō ó ǒ ò ū ú ǔ ù ǖ ǘ ǚ ǜ
coding methods
- Unicode combining diacritical marks with lower-case letters (decimal)
- Unicode combining diacritical marks with lower-case letters (hex)
- Unicode combining diacritical marks with upper-case letters (decimal)
- Unicode combining diacritical marks with upper-case letters (hex)
- lower-case individually numbered entities (decimal)
- lower-case individually numbered entities (hex)
- upper-case individually numbered entities (decimal)
- upper-case individually numbered entities (hex)
- lower-case named entities
- upper-case named entities
- rounded ɑ with Unicode combining diacritical marks (decimal)
- rounded ɑ with Unicode combining diacritical marks (hex)
Combining diacritical marks with lower-case letters (decimal)What's supposed to happen with these is that the mark indicated by the code will be placed on the letter immediately preceding the code. Thus, as you can see from the code columns below, all that should be needed is one code number per tone, regardless of the vowel. But although IE on a Windows system does well with these, other browsers and systems don't render these as well, especially the marks over i's and ü's. Thus, support from font designers, Web browsers, and operating systems isn't sufficient to be able to rely on these yet. Although eventually, this will be the approach to use, for now we need to look to other methods. Previously I thought that these could safely be used for second- and fourth-tone a's, e's, o's, and u's; I was mistaken, as I discovered when I finally saw my site on a Mac. (Macs are hard to come by here in Taiwan.) |
||||||||
---|---|---|---|---|---|---|---|---|
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | ā | ā | á | á | ǎ | ǎ | à | à |
e | ē | ē | é | é | ě | ě | è | è |
i | ī | ī | í | í | ǐ | ǐ | ì | ì |
o | ō | ō | ó | ó | ǒ | ǒ | ò | ò |
u | ū | ū | ú | ú | ǔ | ǔ | ù | ù |
ü | ǖ | ǖ | ǘ | ǘ | ǚ | ǚ | ǜ | ǜ |
Combining diacritical marks with lower-case letters (hex)These are the hex versions of the combining diacritical marks above. They behave in exactly the same way as the decimal versions. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | ā | ā | á | á | ǎ | ǎ | à | à |
e | ē | ē | é | é | ě | ě | è | è |
i | ī | ī | í | í | ǐ | ǐ | ì | ì |
o | ō | ō | ó | ó | ǒ | ǒ | ò | ò |
u | ū | ū | ú | ú | ǔ | ǔ | ù | ù |
ü | ǖ | ǖ | ǘ | ǘ | ǚ | ǚ | ǜ | ǜ |
Combining diacritical marks with upper-case letters (decimal)Again, IE under Windows (but not Mac) gets everything right. But other browsers and systems place the tone marks through rather than fully above the letters, resulting in a mess. None of these can be relied upon, given the current state of browser support. The problem may lie in the fonts themselves rather than non-IE browsers. Either way, these can't be used for now, which is a shame because this is so wonderfully simple and easy to read even in the HTML itself. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | Ā | Ā | Á | Á | Ǎ | Ǎ | À | À |
e | Ē | Ē | É | É | Ě | Ě | È | È |
i | Ī | Ī | Í | Í | Ǐ | Ǐ | Ì | Ì |
o | Ō | Ō | Ó | Ó | Ǒ | Ǒ | Ò | Ò |
u | Ū | Ū | Ú | Ú | Ǔ | Ǔ | Ù | Ù |
ü | Ǖ | Ǖ | Ǘ | Ǘ | Ǚ | Ǚ | Ǜ | Ǜ |
Combining diacritical marks with upper-case letters (hex)These are the hex versions of the combining diacritical marks above. They behave in exactly the same way as the decimal versions. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
A | Ā | Ā | Á | Á | Ǎ | Ǎ | À | À |
E | Ē | Ē | É | É | Ě | Ě | È | È |
I | Ī | Ī | Í | Í | Ǐ | Ǐ | Ì | Ì |
O | Ō | Ō | Ó | Ó | Ǒ | Ǒ | Ò | Ò |
U | Ū | Ū | Ú | Ú | Ǔ | Ǔ | Ù | Ù |
Ü | Ǖ | Ǖ | Ǘ | Ǘ | Ǚ | Ǚ | Ǜ | Ǜ |
Lower-case individually numbered entities (decimal)These are fully reliable as long as the font used supports these characters. Even NN4.7 does fairly well. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | ā | ā | á | á | ǎ | ǎ | à | à |
e | ē | ē | é | é | ě | ě | è | è |
i | ī | ī | í | í | ǐ | ǐ | ì | ì |
o | ō | ō | ó | ó | ǒ | ǒ | ò | ò |
u | ū | ū | ú | ú | ǔ | ǔ | ù | ù |
ü | ǖ | ǖ | ǘ | ǘ | ǚ | ǚ | ǜ | ǜ |
Lower-case individually numbered entities (hex)These hex forms behave exactly like the decimal forms. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | ā | ā | á | á | ǎ | ǎ | à | à |
e | ē | ē | é | é | ě | ě | è | è |
i | ī | ī | í | í | ǐ | ǐ | ì | ì |
o | ō | ō | ó | ó | ǒ | ǒ | ò | ò |
u | ū | ū | ú | ú | ǔ | ǔ | ù | ù |
ü | ǖ | ǖ | ǘ | ǘ | ǚ | ǚ | ǜ | ǜ |
Upper-case individually numbered entities (decimal)Browser support is good here. Since no pinyin syllables begin with i, u, or ü, these letters can be ignored unless someone is going to write in ALL CAPS. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | Ā | Ā | Á | Á | Ǎ | Ǎ | À | À |
e | Ē | Ē | É | É | Ě | Ě | È | È |
i | Ī | Ī | Í | Í | Ǐ | Ǐ | Ì | Ì |
o | Ō | Ō | Ó | Ó | Ǒ | Ǒ | Ò | Ò |
u | Ū | Ū | Ú | Ú | Ǔ | Ǔ | Ù | Ù |
ü | Ǖ | Ǖ | Ǘ | Ǘ | Ǚ | Ǚ | Ǜ | Ǜ |
Upper-case individually numbered entities (hex)These hex forms behave exactly like the decimal forms. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
A | Ā | Ā | Á | Á | Ǎ | Ǎ | À | À |
E | Ē | Ē | É | É | Ě | Ě | È | È |
I | Ī | Ī | Í | Í | Ǐ | Ǐ | Ì | Ì |
O | Ō | Ō | Ó | Ó | Ǒ | Ǒ | Ò | Ò |
U | Ū | Ū | Ú | Ú | Ǔ | Ǔ | Ù | Ù |
Ü | Ǖ | Ǖ | Ǘ | Ǘ | Ǚ | Ǚ | Ǜ | Ǜ |
Lower-case named entitiesUsing "acute" and "grave" for the second- and fourth-tone marks, respectively, makes for easier-to-read HTML than numbered entities -- at least for humans. But there aren't names for first- and third-tone marks. Nor do these work with ü. But other than that, browser support is good. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | n/a | n/a | á | á | n/a | n/a | à | à |
e | n/a | n/a | é | é | n/a | n/a | è | è |
i | n/a | n/a | í | í | n/a | n/a | ì | ì |
o | n/a | n/a | ó | ó | n/a | n/a | ò | ò |
u | n/a | n/a | ú | ú | n/a | n/a | ù | ù |
ü | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Upper-case named entities(Same comments as above for lower-case named entities.) |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
a | n/a | n/a | Á | Á | n/a | n/a | À | À |
e | n/a | n/a | É | É | n/a | n/a | È | È |
i | n/a | n/a | Í | Í | n/a | n/a | Ì | Ì |
o | n/a | n/a | Ó | Ó | n/a | n/a | Ò | Ò |
u | n/a | n/a | Ú | Ú | n/a | n/a | Ù | Ù |
ü | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
rounded a's (decimal)It is sometimes claimed that in Hanyu Pinyin the letter "a" should not be written in the style of most fonts but instead in a rounded manner: ɑ. In most cases, I am all for following the rules. But if this ɑ is a rule, I strongly oppose it as unnecessary and a sin against good typography. (And it may not be a real rule anymore. I would appreciate hearing from anyone with a definitive answer.) Thus, these are for reference only. I do not recommend using this style of the letter "a". The only way to achieve tone marks for this style of letter is to use combining diacritical marks, which, as I discuss at the top of this page, cannot be relied to work properly on Web pages. |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
ɑ | ɑ̄ | ɑ̄ | ɑ́ | ɑ́ | ɑ̌ | ɑ̌ | ɑ̀ | ɑ̀ |
rounded a's (hex)As noted immediately above, these are for reference only. I do not recommend using this style of the letter "a". |
||||||||
first tone | second tone | third tone | fourth tone | |||||
character | code | character | code | character | code | character | code | |
ɑ | ɑ̄ | ɑ̄ | ɑ́ | ɑ́ | ɑ̌ | ɑ̌ | ɑ̀ | ɑ̀ |
Last updated: March 5, 2005