some recent posts elsewhere

Although many notable stories have been in the news lately, I haven’t had time yet to comment on any of them. So for now I’d like to draw everyone’s attention to two recent posts elsewhere:

Pinyin, mispronounced Mandarin linked: Malaysian official

Although announcements in Mandarin are being mispronounced in Kuala Lumpur International Airport, that’s only to be expected because the announcers are paid little and must use Hanyu Pinyin, according to Malaysian Deputy Tourism Minister Datuk Donald Lim Siang Chai.

Bah. Pinyin doesn’t take long to learn. Moreover, it’s simple and accurate. The problem is simply a lack of training. Hanyu Pinyin is probably more closely phonetic than the spelling systems of any of the other languages the airport personnel would have to deliver announcements in.

Here’s the article:

Announcements in Mandarin pronounced wrongly at KL International Airport should be tolerated if the information is accurate, said Deputy Tourism Minister Datuk Donald Lim Siang Chai.

He said information should include time of flight arrivals and departures and gate numbers.

Lim attributed the wrong pronunciations to the announcers, who relied on hanyu pinyin (romanised Chinese).

“It is not easy to get good announcers given the low pay and long working hours,” he told reporters after opening a workshop organised by the Malaysia Mental Literacy Movement here yesterday.

Lim said RTM also has a similar problem in getting newsreaders fluent in dialects.

Sin Chew Daily reported last week that wrong pronunciations at KLIA had not only drawn laughter but also made some tourists irritated.

source: Info more important than how you say it, Star, May 14, 2006

via justnice.org ver 3.0

China’s Cultural Revolution, Pinyin, and other romanizations

Some people have the idea that because during the Cultural Revolution the Red Guards went about destroying much of China’s cultural heritage, they must have attacked Chinese characters and supported Pinyin. This idea is wrong. During that terrible time Pinyin was attacked, like so much else that was good in China.

With the fortieth anniversary of the beginning of the Cultural Revolution upon us, this might be a good time to bring out this selection from The Chinese Language: Fact and Fantasy, by John DeFrancis:

In view of the fact that separate alphabetic treatment for the regionalects has been a virtually tabooed subject since 1949, it comes as a surprise that among the revelations following the downfall of the Gang of Four is an account by Prof. Huang Diancheng of Amoy University of the adaptation of Pinyin to the Southern Min speech of Amoy and its use in the production of anti-illiteracy textbooks and other activities. Huang reports that during the Cultural Revolution people possessing materials in Min alphabetic writing were denounced as “foreign lackeys” and were forced to take the material out to the street, kneel down alongside them, set them afire, and reduce them to ashes. Elsewhere repression of Pinyin in any form was undertaken by xenophobic Red Guards, themselves staunch supporters of character simplification, who tore down street signs written in Pinyin as evidence of subservience to foreigners.

The Nazi-like book-burning episode and other acts against the use of Pinyin are fitting testimony of the repression exercised against activities concerned with fundamental issues in Chinese writing reform. In these actions the positive idea that China should stand on its own feet without demeaning reliance on foreign aid was expressed in its most xenophobic form as a sort of anti-intellectual blood-and-soil nativism that constitutes a danger, still present, of a Chinese-style fascism. The young student storm troopers who sought to humble the old-time intellectuals, far from following Lu Xun in embracing the one system of writing that would have done the most to equalize things between illiterates and all those who had received an education, supported instead the lesser reform of character simplification that might enhance their own position relative to the older generation.

OCR and Pinyin texts

[This entry is largely for my own reference. But feel free to read on, especially if you’re interested in OCR or if you somehow happen to have a lot of Pinyin texts lying around.]

What’s the best way to run optical character recognition (OCR) on texts written in Pinyin with tone marks? Adobe Acrobat 7.0 Standard, the most advanced such software I have on my computer, doesn’t have a “Pinyin” setting. I’d be surprised if any OCR software currently does.

Getting second tones, fourth tones, and umlauts to be read correctly shouldn’t be a big problem, given how the same marks are standard in the orthographies of many European languages. But first tones and third tones are a different matter. The best that can probably be hoped for at present is a more-or-less regular rendering of vowels with first- and third-tone marks as something else that can be fixed quickly through a search-and-replace procedure.

Here’s an image, slightly reduced, of what was being scanned:
scan of sentences in Pinyin

Here’s the text:
W? bù shì xuézh?, bù néng y?nj?ng jùdi?n. Dànshì w? y?u cóng zìj? sh?nghuó l? délái de w? ge zh?nshí lìzi, d?u bi?omíng Hànzì bìng bù tèbié bi?oyì.

Here are the results of OCR, with various language settings applied:

DUTCH
WÖ bu shi xuézhë, bù néng yinjing jùdiän. Danshi wö yöu cóng zip shënghuó
li délái de wü ge zhënshí Iizi, döu biäomíng Hanzì bing bu tebié biäoyì.

CATALAN
W6 bu shi xuezh8, bir neng yinjing jhdisn. Danshi w6 y5u cong ziji shenghuó
li delai de wü ge zhenshí lizi, d6u bibmíng Hanzi bing bu tebie bigoyi.

DANISH
W6 bu shi xuezhe, bU neng yinjing jhdian. Danshi w6 y5u cong ziji shGnghu6
li delai de wii ge zhenshi Iizi, dóu bigoming Hanzi bing bu tebie biaoyi.

FINNISH
WÖ bu shi xuezhe, bU neng yinjing jiidiän. Danshi wö yöu cong ziji shGnghu6
Ii delai de wii ge zhenshi Iizi, döu biäoming Hanzi bing bu tebie biäoyi.

FRENCH
W6 bù shi xuézhë, bù néng yinjing jùdian. Dànshi wO y5u cong ziji shënghu6
li délai de wü ge zhënshi Iizi, dou bigoming Hànzi bing bu tèbié biaoyi.

GERMAN
WÖ bu shi xuezhe, bU neng yinjing jiidiän. Danshi wö yöu cong ziji shGnghu6
li delai de wü ge zhenshi Iizi, döu biäoming Hanzi bing bu tebie biäoyi.

GERMAN (SWISS)
WÖ bu shi xuezhe, bU neng yinjing jiidiän. Danshi wö yöu cong ziji shGnghu6
li delai de wü ge zhenshi Iizi, döu biäoming Hanzi bing bu tebie biäoyi.

ITALIAN
W6 bu shì xuézhe, bù néng yinjing jùdian. Dànshì w6 y5u cong ziji shènghu6
li délai de wii ge zhenshi Iizi, dou bigoming Hànzì bing bu tèbié biaoyì.

NYNORSK
W6 bu shi xuezhe, bU neng yinjing jhdian. Danshi wO y5u cong ziji shGnghu6
li delai de wii ge zhenshi Iizi, dou biaoming Hanzi bing bu tebie biaoyi.

PORTUGUESE (BRAZILIAN)
WÕ bu shi xuézhe, bU néng yinjing jùdiãn. Danshi wõ yõu cóng ziji shènghuó
li délái de wü ge zhenshí Iizi, dõu biãomíng Hanzi bing bu tèbié biãoyi.

PORTUGUESE
WÕ bu shi xuézhe, bU néng yinjing jùdiãn. Danshi wõ yõu cóng ziji shènghuó
li délái de wü ge zhenshí Iizi, dõu biãomíng Hanzi bing bu tèbié biãoyi.

SPANISH
W6 bu shi xuézhe, bU néng yinjing jhdian. Danshi wO y5u cóng ziji shenghuó
li délái de wü ge zhenshí Iizi, dóu bigomíng Hanzi bing bu tebié biaoyi.

There’s no clear winner. The best results, such as they are, appear to be using Dutch and Portuguese (Brazilian or standard).

new MRT signage

David has posted on the inconsistent use of Tongyong Pinyin in the Taipei-area MRT system. I’ve already put a comment there, so I’ll not duplicate everything here.

I spend a lot of time complaining about signage, and my experiences in trying to get some errors in the MRT system corrected have, predictably, been frustrasting. But there is something I do really like: the font for the MRT signage. (See the photos with David’s post.) Does anyone recognize it?

For those of you not in Taiwan, the MRT is the Metropolitan Rapid Transit system for the Taipei area. Most of the system takes the form of a subway. One line, however, is elevated, as is a section of a different line (which also runs on ground level for several miles).

Wenlin: ‘software for learning Chinese’

I get a lot of questions about how to do some sort of conversion involving Chinese characters. Most of the time, my answer is something like, “Get Wenlin. Even the free, non-expiring demo version (4 MB) will do what you need — and a lot more.”

For those of you who aren’t familiar with Wenlin, Random Stuff That Matters has posted a five-minute movie (with sound) of Wenlin in action (14.5 MB).

The range of what Wenlin can do extends far beyond what the movie shows. A lot of people might not notice that even in the demo a wide range of options are available under

  • EditMake Transformed Copy

My favorite, which is available only with the full version, is

  • EditMake Transformed CopyPinyin Transcription

Oh, it is a thing of beauty. (That function, though, works only in the full version, not the demo.)

For those of you who have the full version, I thought I’d share a little-known feature of Wenlin: its ability to search for regular expressions.

Let’s say you are trying to remember a chengyu (set phrase) about studying, but all you can recall is that it contains the sound “rubu.” You’re not sure of the characters. You’re not even sure of the tones. First you look up entries beginning with “rubu” in Wenlin’s electronic edition of the ABC Chinese-English Comprehensive Dictionary:

  • ListWords by Pinyin
  • Then enter rubu and hit OK.

This will take you to rùbùfūchū and rúbùshèngyī. But neither of those is what you’re looking for. Now what? Here’s where regular expressions come in handy.

Hit Ctrl+F to search for something within the current page.

In the Find box, enter

  • re=r(u|ū|ú|ǔ|ù)b(u|ū|ú|ǔ|ù)

This will yield:

  • chǒngrǔbùjīng 寵辱不驚[宠–惊] f.e. unmoved by honors/disgrace
  • lèirúbùgān 淚濡不乾[泪–干] f.e. be drowned in tears
  • nièrúbùyán 囁嚅不言[嗫—] f.e. 〈wr.〉 move the mouth without speaking
  • xuérúbùjí 學如不及[学—] f.e. study as if one could never learn enough

Bingo!

The reason for using OR pipes to separate the possibilities instead of putting them together — i.e., the reason for writing (u|ū|ú|ǔ|ù) instead of [uūúǔù] — is that the regex library sees non-ASCII characters as strings of bytes (UTF-8); thus, without the pipes you could end up with extra garbage or not find what you intend to at all. This might be fixed in the next version.

surname-spelling scrap

Danwei has picked up on a story of someone in China with the surname of Xiè being issued an air ticket under the name Jiě. The reason behind the mixup is that the character used for this woman’s name, 解, is most often pronounced “jie,” as in jiěfàng (liberate; emancipate), jiějué (solve; resolve; settle), liǎojiě (understand; comprehend; find out; acquaint oneself with), and jiěshì (expound; interpret; analyze). Thus, it is but one of the many Chinese characters that has more than one pronunciation.

When she and some of her relatives went to the travel agency to get the matter cleared up, however, an argument broke out. Before long, people from the travel agency were using poles to beat the family.

(Maybe not my strongest entry, but there was no way I was going to pass up a chance to post on a story titled “Is personal safety another argument for Chinese romanization?”)

sources:

Zhou Youguang in the news again

Guangming Ribao has a long piece this week on Zhou Youguang, one of the main people behind the creation of Hanyu Pinyin: Zhōu Yǒuguāng: bǎisuì xīngchén, wénhuá cànrán (周有光:百岁星辰 文华灿然, Guāngmíng Rìbào, April 23, 2006). This also has lots of photos.

For autobiographical material by him in English, see A devotion that goes beyond words, from the South China Morning Post in the late 1990s.

For a selection of writings by him, see The Historical Evolution of Chinese Languages and Scripts (中国语文的时代演进 Zhōngguó yǔwén de shídài yǎnjìn).