How to put Pinyin with tone marks on Web pages.
If you just want to convert some text that has tone numbers and don't care about the technicalities of how it gets done, you could skip this page and go directly to the tone numbers to tone marks converter. Alternately, if you have text with tone marks and want to change this into something usable on Web pages, see convert special characters to Web-friendly Unicode numbers. (I'll put an improved converter up soon.)
Displaying Pinyin without tone marks needs no explanation, because Pinyin uses no letters not found in English, other than ü, which is coded as ü. (Moreover, ü is seldom needed in Pinyin.) But if you want to display tone marks -- and in many cases you should -- then Unicode is the way to go.
In order to have a Web site with tonal Pinyin work across as many operating systems and browsers as possible, you need to use all of the following:
In addition, I have two recommendations:
proper "charset" declaration
Pages with tonal Pinyin need to be in Unicode, not in Big5 (used primarily in Taiwan) or GuoBiao (used primarily in China). Using any "charset" other than utf-8 (Unicode) is asking for trouble.
The only way in which utf-8 could be a problem is if your page uses rare characters that appear in Big5 or GB but do not yet appear in Unicode.
Here's what you should have in your code in the "head" of your Web page's HTML:
<meta http-equiv="Content-Type" content="text/css; charset=utf-8" />
Note: If your page contains Chinese characters and has a charset other than utf-8, changing "big5" to "utf-8" is not enough to solve your problem; you will also need to change the encoding of the Chinese characters to Unicode or to Unicode numerical character references (NCRs).
Tonal Pinyin and CSS
CSS is the best thing to hit the Web in years. If you're a webmaster and don't know CSS, it's time to learn. The basics are simple; and even just these will change for the better the way you approach making Web sites.
Because not all fonts have the necessary characters, if you want to put tonal Pinyin on a Web page you should include a font-family declaration in your style sheet. Here's what I use:
.py { font-family: "arial unicode ms", "lucida sans unicode", sans-serif !important; font-family: serif; }
This line and the one above are an IE hack, since IE doesn't recognize 'important'
Actually, I can probably come up with a better hack than that, since Firefox now behaves largely as IE does (at least in the good ways).
Accordingly, Pinyin text in your Web page needs to be assigned to a class. I use "py". Thus, to have "Hànyǔ Pīnyīn" in the middle of a phrase, you would use:
to have "<span class="py">Hànyǔ Pīnyīn</span>" in the middle of a phrase....
optimal coding for individual characters
This part can get a little tricky. See my test charts for tonal Pinyin in Unicode Web pages.
Guidelines:
- Use individually numbered entities, along with named entities if you like.
- Do not use combining diacritical marks
- Use the style of letter "a" that comes with the font. Do not use rounded a's.
Additional recommendations
Avoid bold in Pinyin text
Another tip: Avoid using bold, strong, or CSS's font-weight above 500 if your text has any umlauts (ü), because the dots will merge together (compare standard weight, ü ǖ ǘ ǚ ǜ, and bold, ü ǖ ǘ ǚ ǜ) in most of the fonts needed for tonal Pinyin. For bold-looking text, instead use the relatively heavy-looking Lucida Sans Unicode rather than the thinner Arial Unicode MS as the first choice in your font-family declaration.
Avoid italic in Pinyin text
Italic works a little better than bold; but it is still best avoided, especially for long runs of text.
Tonal Pinyin in italic:
Last updated: March 13, 2005