Possible workaround for the encoding problem

Earlier this evening, as I was browsing through some old posts here on Pinyin News, I was startled to notice in one post the presence of some Chinese characters that were not scrambled but correct (e.g., here). I investigated.

The difference between that page and most others is that one employed Unicode numerical character references (NCRs) for Chinese characters and diacritics.

Entering such code appears to be a stable way of getting around the hack. Thus, for example, in order for the characters “漢字” to appear in the published post correctly despite the encoding hack into Swedish, they would need to be entered in the post’s HTML as “漢字” rather than in the more human-friendly direct form of “漢字”. The same goes for most diacritics.

So, although this isn’t a real solution, at least I should be able to make new posts here render correctly; and, no less importantly, these new posts should remain correct even after the encoding problem for the rest of the blog is finally fixed, since NCRs are ASCII-friendly and thus shouldn’t become scrambled.

This should also mean that your comments can again safely include Chinese characters, etc., as long as you use NCRs as well, which can be accomplished with relatively little hassle by employing Pinyin.info’s online tool to convert Chinese characters and Pinyin diacritics to Unicode numerical character references. Check it out.

Look, Ma, no GIFs or PNGs!

Wǒ ài Hànyǔ Pīnyīn!
我愛漢語拼音!

PRC’s official rules for Pinyin: 2012 revision — in traditional Chinese characters

Last week I put online China’s official rules for Hanyu Pinyin, the 2012 revision (GB/T 16159-2012). I’ve now made a traditional-Chinese-character version of those rules for Pinyin.

Eventually I’ll also issue versions in Pinyin and English.

gbt_16159-2012_traditional
(Note: The image above is of course Photoshopped. I altered the cover of the PRC standard simply to provide an illustration in traditional Chinese characters for this post.)

OK, so here’s what I’m gonna do

The encoding problem caused by the hack still isn’t fixed. This means that Chinese characters and Pinyin with tone marks still don’t appear properly on this blog (but they’re fine on pages in the rest of Pinyin.info). But, still, there are some things I’d like to let people know about, including an important announcement coming up soon. So I’m going to start posting some things, even though that means no Hanzi or tonal Pinyin for at least the near future. (Don’t forget: That means Hanzi won’t work in your comments here.) Fortunately, most of the time Pinyin doesn’t really need tone marks.

Without Hanzi and tone marks it’s more difficult to write about Chinese characters and Pinyin, which are, er, only the main topics of the site. But I’ll do what I can. Anyway, why let Victor Mair have all the fun?

So until the encoding issue is resolved y’all can expect a relatively large number of posts catching up on Pinyin-friendly fonts, a few posts covering news and announcements, and probably at least a little of the bile that you’ve come to expect from this site — unless, of course, during the years I’ve let this blog go fallow public signage has all been fixed, the authorities are finally using Pinyin correctly, and people who ought to know better have stopped spouting complete nonsense about Chinese characters. Heh. We’ll see.

Zhou Youguang writes about Pinyin.info

I’d like to share a note that Zhou Youguang, the father of Pinyin, very generously wrote to me last week.

??Mark Swofford ???????,????????????.???Swofford ?????????! / ?????????? / ?????????? / ??????????? / ??? / 2012-03-02 / ??107?

感谢Mark Swofford 先生的拼音网站,把拼音用做学习 中文的工具.我祝贺Swofford 先生的工作获得成功!

语言使人有别于禽兽,
文字使文明别于野蛮,
教育使先进有别于落后。

周有光
2012-03-02
时年107岁

Gǎnxiè Mark Swofford xiānsheng de pīnyīn wǎngzhàn, bǎ pīnyīn yòngzuò xuéxí Zhōngwén de gōngjù. Wǒ zhùhè Swofford xiānsheng de gōngzuò huòdé chénggōng!

Yǔyán shǐrén lèibié yú qínshòu,
wénzì shǐ wénmíng bié yú yěmán,
jiàoyù shǐ xiānjìn bié yú luòhòu.

Zhōu Yǒuguāng
2012-03-02
shí nián 107 suì

Five years of Pinyin News

I can’t believe I’ve been doing this so long….

And even at the age of five this blog is considerably younger than most of the rest of Pinyin.info, which I built largely by hand using a text editor. Oh, those were the days. Fortunately, I now have a paid staff of dozens to handle most of the work you see here and a lot more that goes on behind the scenes.

Well, OK, I made that up. It’s still just me, though I really am often busy behind the scenes on Pinyin-related projects or pestering government officials. But it’s a living hobby.

Wiki for collaborative Pinyin projects

I have long wanted to expand the range of materials available in and about Pinyin. Possibilities for projects include:

  • Hanyu Pinyin subtitles for movies and videos
  • Hanyu Pinyin versions of Mandarin plays (for example, Cháguǎn, by Lǎo Shě)
  • translations into Mandarin (Hanzi and/or Pinyin) of parts of this site

I can do a lot of the work — in fact, as is my habit, I’ve begun all sorts of such projects but haven’t finished them — but can’t do all of it myself. So I’ve been mulling the idea of setting up a Pinyin-related wiki here on Pinyin.info or perhaps on a spinoff site I set up, which would allow you, o reader, to get involved (a little or a lot, depending on your desire and amount of free time).

I’m thinking that texts could be worked on with the aid of Wenlin, since even contributors without the full version of that enormously useful program could use its free demo to select disambiguation choices in cases of word-parsing ambiguities or characters with multiple pronunciations.

For example, if one were using Wenlin to convert the following into Pinyin,

我在朦胧中,眼前展开一片海边碧绿的沙地来,上面深蓝的天空中挂着一轮金黄的圆月。我想:希望本是无所谓有,无所谓无的。这正如地上的路;其实地上本没有路,走的人多了,也便成了路。

one would first need to choose between potentially ambiguous word boundaries

|我 | 在 | 朦胧 | 中,眼前 | 展开 | 一 | 片 | 海边 | 【◎Fix:◎碧绿 | 的;◎碧 | 绿的】 | 沙地 | 来,上面 | 深蓝 | 的 | 【◎Fix:◎天空 | 中;◎天 | 空中】 | 挂着 | 一 | 轮 | 金黄 | 的 | 圆月。我 | 想:希望 | 本 | 是 | 无所谓 | 有,无所谓 | 无 | 的。这 | 正如 | 【◎Fix:◎地上 | 的;◎地 | 上的】 | 路;其实 | 地上 | 本 | 没有 | 路,走 | 的 | 人 | 多 | 了,也 | 便 | 成了 | 路。

and then take care of items with multiple pronunciations

Wǒ zài ménglóng 【◎Fix:◎zhōng;◎zhòng】, yǎnqián zhǎnkāi yī 【◎Fix:◎piàn;◎piān】 hǎibiān bìlǜ de shādì lái, shàngmian shēnlán de tiānkōng 【◎Fix:◎zhōng;◎zhòng】 guàzhe yī lún jīnhuáng de yuányuè. Wǒ xiǎng: xīwàng běn shì wúsuǒwèi yǒu, wúsuǒwèi 【◎Fix:◎wú;◎mó】 de. Zhè zhèngrú 【◎Fix:◎dìshang;◎dìshàng】 de lù; qíshí 【◎Fix:◎dìshang;◎dìshàng】 běn méiyǒu lù, zǒu de rén duō 【◎Fix:◎le;◎liǎo;◎liāo;◎liào;◎liáo】, yě 【◎Fix:◎biàn;◎pián】 chéngle lù.

I’d prefer to keep things generally on the right side of copyright laws but am also hopeful that those may not be too onerous in the case of Pinyin versions and that Taiwan’s laws may put the situation more in our favor than might be the case elsewhere. Information about the legal situation would be greatly appreciated.

So, is anyone interested in helping out? Have advice? Success/horror stories about wiki projects? Suggestions for additional material?

Y.R. Chao’s responses to arguments against romanization

Y.R. Chao. Also, FWIW, Wikipedia took this image from Pinyin.Info, not the other way around.Pinyin.Info has a new reading: Responses to objections to romanization, written by the brilliant linguist Y.R. Chao in 1916, when he was a young man of 24.

It’s an unfortunate irony that another writing associated with Chao, the famous “stone lions” (a.k.a. shi, shi, shi) piece, is often mistakenly cited as evidence that the author opposed romanization. In fact, Chao favored using romanization for Mandarin, as his essay reveals.

It’s written in the form of 16 “objections,” each followed by Chao’s reply. For example:

Obj. 8 Alphabetized Chinese loses its etymology.

Rep. 8 This argument is like that often urged against simplified English spelling and is to be met similarly. In actual usage, how much attention do we give to etymology in words like 學, 暴, 發, 旋, 之, through, draught, etiquette, row, disaster? Of how many of these very common words do you know the original meaning? It is not to be denied, of course, that it is useful to know the etymology of words by looking them up, and our future dictionaries of alphabetized polysyllabic words should no doubt give their derivations.

The etymology of disaster (which is pretty cool) is certainly easy enough for an educated person to guess, if you stop to think about it. But I must admit I never had.

I have added notes following the text.

detailed rules for Hanyu Pinyin: a major addition to Pinyin.Info

cover of Chinese Romanization: Pronunciation and OrthographyFor several years I’ve had online the brief official principles for writing Hanyu Pinyin. But those go only so far. Fortunately, Yin Binyong (Yǐn Bīnyōng / 尹斌庸) (1930-2003), who was involved in work on Hanyu Pinyin from the beginning, wrote two books on the subject, producing a detailed, logical, and effective orthography for Pinyin.

The only one of those two books with English explanations as well as Mandarin, Chinese Romanization: Pronunciation and Orthography (Mandarin title: Hànyǔ Pīnyīn hé Zhèngcífǎ / 汉语拼音和正词法 / 漢語拼音和正詞法), has gone out of print; and at present there are no plans to bring it back into print. Fortunately, however, I was eventually able to secure the rights to reproduce this work on Pinyin.Info. Yes, the entire book. So everybody be sure to say thank you to the generous publisher by buying Sinolingua’s books.

This book, which is nearly 600 pages long, is a mother lode of information. It would be difficult for me to overstate its importance. Over the next few months I’ll be releasing the work in sections. I had intended to delay this a little, as I have had to wait for a fancy new scanner and am still awaiting some OCR software that can handle Hanzi as well as the Roman alphabet. (This Web site is an expensive hobby!) But since Taiwan has recently adopted Hanyu Pinyin I will be releasing some material soon (without OCR, for the time being) in the hope of helping Taiwan avoid making mistakes in its implementation of an orthography for Pinyin here.

Watch this blog for updates.