URLs, Chinese characters, and the Roman alphabet

In Will China Build a Separate Internet? John Yunker, citing Naseem Javed’s When Will The Internet Be Divided Among Nations?, states, “Naseem does raise a very important point — for Chinese speakers, the Internet is far from user-friendly. The major obstacle is the URL, which is still limited to ASCII (Latin) characters.”

I don’t see where Naseem Javed made that particular point — but no matter. I just want to note that URLs in ASCII do not present an obstacle to Internet users in China. After all, the Roman alphabet (specifically, Pinyin) is what most people use to enter Chinese characters on computers in the first place. And even those in China who don’t use Pinyin to input Chinese characters are perfectly capable of using their, yes, QWERTY keyboards to type the ASCII in URLs, the Roman alphabet having been taught for decades to every schoolchild in China (at least to those now literate enough to use the Internet in the first place).

On the other hand, having to enter Chinese-character URLs would be an obstacle to most of the world’s population.

Those looking to argue that ASCII URLs could be an obstacle would do better to look to Russia, Greece, or Saudi Arabia.

The folks at ICANN and IETF are working to upgrade the DNS to Unicode, but this will take time. There is a workaround in use that allows Web users to input Chinese characters as a URL which is then transformed into ASCII characters behind the scenes (known as “Punycode”) but I’m not sure how widely used this system currently is.

IE7 is supposed to have good support for Punycode. Now if only IE would finally get CSS right….

Here’s an example of Punycode: 拼音 is xn--muuy29i, according to an open-source Punycode converter. Thus, http://拼音.pinyin.info and http://xn--muuy29i.pinyin.info should both lead to the same page. And I would hope that the address bar in the browser would read http://拼音.pinyin.info instead of the xn--muuy29i ASCII version.

If you add a comment on how well the Punycode tests work for you, please mention your computer’s operating system and browser. (I’m using Win2K and Opera 8.51, and both http://拼音.pinyin.info and http://xn--muuy29i.pinyin.info work fine.)

The ‘g’ in Ang Lee

Ang Lee (李安), the director of Brokeback Mountain, Crouching Tiger, Hidden Dragon, The Ice Storm, Sense and Sensibility, Eat Drink Man Woman, and many other films, was recently back here in his homeland of Taiwan.

I’ve long wondered how Lee ended up with such an odd form for the romanization of his name. I’m not referring to the spelling of his family name, Lee. Although the Anglicization of “Lee” for 李 is not standard in any of the main romanization systems, that particular spelling is almost certainly more common in Taiwan than “Li,” which is the form in most romanization systems other than Gwoyeu Romatzyh. In Gwoyeu Romatzyh, which nominally was Taiwan’s official romanization system until 1986 — long after Lee acquired a passport and had gone to the United States — is written Lii; but I’ve never seen that spelling used for a name here.

No, what puzzles me is the g in his given name of “Ang.” In Mandarin, this is one of those relatively rare syllables spelled the same in basically all of the main romanization systems: an. So where is that g from? (Please don’t read through the rest of this message in any kind of suspense, because I still don’t know the answer to that question, though I’m hoping one of my readers will.)

For those unfamiliar with Mandarin, Ang Lee’s given name is not originally pronounced like American English’s bang without the b or sang without the s. Rather, the a is similar to that in the English word father; the n is about as you’d expect; and there’s no g. So the name is pronounced something like the French (not English) version of Anne or the end of the German Autobahn.

The Ang spelling doesn’t appear to come from Taiwanese. Even in Taiwanese 安 would be romanized as an, not ang, in the dominant systems. (Correct me if I’m wrong, please. I know almost no Taiwanese.) Also, at the time Lee would have adopted the Ang spelling, the use of Taiwanese romanization for names would most certainly have been intensely frowned upon by the authorities if not forbidden outright. Moreover, I don’t think Lee is even ethnically Taiwanese/Hokkien.

Of course, he may have chosen to use a spelling other than what he was made to use on his passport. But people in Taiwan seldom do that unless they adopt an “English” name, which “Ang” is certainly not. The g might be there to help prevent people from thinking he’s a woman named Ann. But if that were the concern, why not simply adopt an English name?

Poagao, who met Ang Lee in Taipei last week, met back in September with Lee’s little brother, who’s known as Khan (or perhaps Kan) Lee. As Poagao notes, there’s something strange with that name, too:

One thing I’d like to know is why “Ang” gets an unneccesary ‘g’ (it should be “An”), while “Kan” is one ‘g’ short (it should be “Kang”). Did Ang steal his little brother’s ‘g’ at some point?

Ang Lee’s brother’s name is Lǐ Gǎng in Hanyu Pinyin. (Theoretically, it could also be Lǐ Gàng or Lǐ Gāng because 崗 is one of many Chinese characters with multiple pronunciations.) The use of k rather than g comes from the Wade-Giles romanization system. In Taiwan, most people’s passports have names romanized using improper, bastardized Wade-Giles, which helps create a lot of confusion — as if Wade-Giles itself weren’t confusing enough already. Moreover, Taiwan’s passport office operates on the principle of chabuduo jiu keyi, which in this context is a close approximation of the English saying “close enough for government work.” In other words, if a spelling looks more or less correct it will probably pass — unless, that is, it has Hanyu Pinyin’s x or q in it, in which case it would probably be rejected. (I’m not making this up. I’ve spoken with people in the passport office about this.)

In looking through Lee Ang’s biography I noticed that he has two sons, one of whom is named “Haan,” at least according to the Internet Movie Database’s credits for Pushing Hands, one of Lee’s early movies. At first, I thought this might be a two-syllable given name that had been run together: Ha’an (or Ha-an, following the style used in Taiwan). Could this be the same an as in Ang Lee’s name — just this time without the mysterious g? But it turns out that Haan is a one-syllable name.

Here’s the character: 涵.

A doubled vowel in romanized Mandarin usually indicates the use of Gwoyeu Romatzyh’s tonal spelling. But the “Haan” spelling would be for third tone, while Haan’s name should be pronounced with a second tone. (This would be written “Harn” in Gwoyeu Romatzyh.)

So perhaps the IMDB entry is a typo, and the real spelling should be Han, as expected. Or maybe those in the Lee family just like funny spellings.

Some say ‘no 3Q’ to Net slang in Chinese test

Internet slang and emoticons were included in the Chinese-language section of this year’s college-entrance exam for Taiwan, to the dismay and confusion of many.

Examples of this in the exam include

  • ::>_< ::
  • 3Q
  • Orz

::>_< :: is supposed to represent crying. (The colons are tears, the underscore is the mouth, and the others are the eyes.)

For "3Q," the three is pronounced san and the Q is pronounced as in English, yielding "san Q," which is meant to represent the English phrase "thank you."

"Orz" is intended to be a pictograph of a person bowing down on the floor, with the O as the head, the vertical line of the r as the arms, and the z as the legs.

This test is crucial to the lives of those seeking to enter post-secondary education. Many students spend years studying for this exam. The nation's parents, stressed-out from worry about how their children will do on this test, will probably go ballistic over this. I'll be surprised if those questions end up being counted toward the final score.

On the other hand, I can't help but think that given how much Classical Chinese is certain to be on the test, a few questions about modern Internet slang might not be inappropriate. After all, the latter is likely to have more relevance to the majority of today's college students and even possibly more a part of modern Mandarin than some parts of literary Sinitic.

sources:

Vietnamese culture appears shallow without Chinese characters, says Chinese writer

The bias many people in China have toward Chinese characters and against romanization is so entirely common that it’s hardly newsworthy. But I should probably bring up examples from time to time, just as a reminder. Here’s one.

The vice president of the Chinese Writers Association, Chen Jiangong (Chén Jiàngōng / 陈建功), recently gave a wide-ranging talk in Guangzhou. He touched on Vietnam’s adoption of the roman alphabet for its writing system:

Wǒ xiǎngqǐ le wǒmen zài shàng ge shìjì sānshí niándài de shíhou, Gùgōng Bówùyuàn de Yè Péijī yuànzhǎng shuō wénhuà ruò wáng zé yǒng wú bǔjiù, zhè shǐ wǒ xiǎngqǐ wǒ céngjīng fǎngwèn Yuènán de shíhou, jiù fāxiàn Yuènán zhèige mínzú guòqù cǎiyòng de shì Hànzì, zài shàng ge shìjì chū de shíhou, yīnwèi yī ge Fǎguó chuánjiàoshì wèile chuánbō tāmen de Jīdūjiào wénmíng, suǒyǐ jiù fāmíng le Lādīngwén de pīnyīn zìmǔ, Yuènánrén kāishǐ zhújiàn bùyòng Hànzì, jiù yòng Lādīng zìmǔ lái pīn Yuènán wén le, wǒ zài Yuènán fāxiàn tāmen de zuòjiā xiě de wénzhāng dōu shì yòng Lādīng zìmǔ lái pīn, zhèyàng jiù xiǎn de Yuènán de wénhuà gēnjī xiǎnde jíqí fúqiǎn le, wǒ jiù xiǎngqǐ le Yè Péijī de zhè jù huà.

Here’s a paraphrased translation:

In the 1930s Ye Peiji, the head of the Imperial Palace Museum, said that if culture is lost it’s gone forever. When I visited Vietnam I learned that the Vietnamese people once used Chinese characters. But because a French missionary invented a romanization method in order to spread Christianity, Vietnamese people gradually began not to use Chinese characters and instead used romanization for their language. In Vietnam, I discovered that their writers’ works all use romanization. Thus, the foundation for Vietnamese culture appears to be extremely superficial. This immediately brought to mind Ye Peiji’s words.

Pretty typical.

source: Zhùmíng zuòji? Chén Jiàng?ng lùn wénxué: Gu?ngzh?u bù shì wénhuà sh?mò (著名作家陈建功论文学:广州不是文化沙漠), Dàyáng W?ng, December 16, 2005

Chinese characters, Pinyin, and computers

Recently added to my list of recommended readings: Characters and Computers, edited by Victor H. Mair and Yongquan Liu. Although this collection was published in 1991 and thus no longer represents the state of the art, the issues raised here remain relevant.

Of particular interest, at least where Pinyin is concerned, is the important essay Pinyin-to-Chinese Character Computer Conversion Systems and the Realization of Digraphia in China, by Yin Binyong, who has also written the books on Pinyin orthography: Chinese Romanization: Pronunciation and Orthography and the Xinhua Pinxie Cidian. The complete text of this substantial essay (nearly 6,000 words) is available here on Pinyin Info. I strongly encourage everyone to read this.

Here are the subject headings:

  1. The Three Stages in the Development of Pinyin-to-Chinese Character Computer Conversion Systems
  2. The Theoretical Contribution of the Pinyin-to-Chinese Character Conversion System to the Realization of Digraphia in China
  3. Practical Contributions of Pinyin-to-Chinese Character Conversion Systems to Digraphia in China
    1. Can alphabetized Chinese take the road of “pinyin pictophonetic characters”?
    2. What is an appropriate way to handle the representation of tones in a Pinyin-based writing system?
    3. How to solve the problem of homonyms in alphabetized (Pinyin) Chinese writing?
  4. Directions for the Future

paper on Tongyong and Hanyu Pinyin in Taiwan

One-Soon Her (何萬順 / Hé Wànshùn), a professor in the Graduate Institute of Linguistics at Taiwan’s National Chengchi University (Guólì Zhèngzhì Dàxué), published a paper last month on Taiwan’s romanization issue in one of Academia Sinica’s journals: 「Quánqiúhuà」yǔ「zài dì huà」: cóng xīn jīngjì de jiǎodù kàn Táiwān de pīnyīn wèntí (Between Globalization and Indigenization: On Taiwan’s Pinyin Issue from the Perspectives of the New Economy).

Here’s the English abstract:

The only remaining controversy in Taiwan’s efforts to standardize its pinyin system for Chinese is whether to adopt Tongyong or Hanyu; while the former has an intense symbolic value of indigenization, the latter enjoys a substantial globalized distribution. This paper first makes clear the nature of ‘interface’ of any pinyin system and examines this seemingly domestic issue from the perspectives of the New Economy in the global Information Age. Given the characteristics of ‘increasing returns’ and ‘path-dependence’, Hanyu Pinyin, with its universal standardization and dominant global market share, is the obvious choice. Taiwan’s implementation of Tongyong Pinyin must necessarily incur the cost of dual interfaces. Given the 85% overlap between the two systems, Tongyong, as a politically meaningful symbol, ironically, creates a division among Taiwan’s population. The unfortunate politicization of the pinyin issue has cornered the nation into a dilemma: Tongyong costs economically, Hanyu costs politically. The ultimate reconciliation thus hinges upon the implementation of a system that optimizes Tongyong’s indigenized symbolic value and Hanyu’s globalized substance, to the furthest extent possible.

I disagree with the 85 percent figure; but the number doesn’t matter much in Her’s approach, which, considering he’s a linguist, is surprisingly non-linguistic. He gives two main recommendations for Taiwan’s central government, meant to be taken together. The first of these is that Taiwan should make Tongyong Pinyin the nation’s sole romanization system for Mandarin, with compliance among cities and counties mandatory. The delightfully arch second requirement, however, has an interesting twist: Everything that’s different between the national standard (i.e., Tongyong Pinyin) and the international standard (i.e., Hanyu Pinyin) should be changed to conform to the international standard. In other words, Taiwan should have Hanyu Pinyin in all but name.

???????????????????????????????
????100%??????????15%???????????????

I’d be OK with that. But I doubt Tongyong supporters will be willing to go along.

Many thanks to Dan Jacobson for the link.

Here are the essay’s subject headings:

  1. Qiányán: zài Tōngyòng yǔ Hànyǔ zhījiān
  2. pīnyīn xìtǒng de jièmiàn gōngnéng
  3. xīn jīngjì de xiànshí tèzhì
    1. lùjìng qǔjué
    2. wǎnglù xiàoyìng
    3. suǒdìng xiàoyìng
  4. jiànpán jièmiàn de lèibǐ
    1. dúbà quánqiú de QWERTY jiànpán
    2. Dvorak de jìngzhēng shībài
    3. jiànpán shìchǎng de jīngjì jiàoxun
    4. jiànpán jièmiàn yǔ pīnyīn jièmiàn de lèibǐ
    5. Tōngyòng Pīnyīn de「zài」zhuǎnhuàn dàijià
    6. pīnyīn yǐ shì zuórì de páijú yóu xì
  5. Yīngyǔ pīnyīn de lèibǐ
  6. pīnyīn lùnzhèng de qīzhébākòu
    1. 「biāozhǔnhuà」yǔ「lǒngduàn」de hùnxiáo
    2. Tōngyòng yǔ jiāo luó de zhēngyì
    3. Tōngyòng de fēi jīngjì lùnzhèng
    4. Tōngyòng Pīnyīn de fēnliè xiàoyìng
    5. Tōngyòng zhuǎnhuàn Hànyǔ de máodùn
    6. Tōngyòng yǔ Hànyǔ「xiāngróng」de máodùn
    7. pīnyīn dà héjiě de kěnéng fāngxiàng
  7. jiélùn:néng hézuò,guójiā rénmín cáinéng zhìfù
  8. cānkǎo shūmù(Zhōngwén shūmù àn bǐhuà páixù)

In case anyone’s wondering about the references to QWERTY and Dvorak, Her is drawing an analogy, saying the situation with Hanyu is largely the same as with QWERTY: whatever the merits of other systems, it’s very likely to remain the standard.

Y.R. Chao and Humpty Dumpty

cover image of the book Sayable ChineseI’ve just added Y.R. Chao’s Sayable Chinese series to my list of recommended books. The second book in this series of three comprises Chao’s delightful translation of Lewis Carroll’s Through the Looking-Glass. I’ve selected part of the Humpty Dumpty chapter for the sample reading on Pinyin Info. Although the sample has romanization and English, the Sayable Chinese books have romanization and Chinese characters, presented en face. (I’ll add the Chinese characters one of these days, but they’re so much trouble to type! And scanning isn’t much of an improvement.)

These hardback books are a good deal at US$15 each.

The romanization method is Chao’s own Gwoyeu Romatzyh system.

Here’s a sample, with Hanyu Pinyin for comparative purposes:

Gwoyeu Romatzyh

Keesh neh jitzeel yueh jaang yueh dah, yueh jaang yueh shianq ren-yanql: Alihsy tzoou-dawle i-leang-janq luh gencheal, jiow kann.chu ta yeou yeanjing byitz tzoei lai le; ta tzay tzoou-jinn ideal jiow chingchingchuuchuu de kann.chulai ta jiowsh HUENDIH DUENDIH been-ren le. Ta duey tzyhjii shuo, “Jeh buhuey sh byeren le! Yonq.bu-jaur geei ta shieele maan-lean de mingtz woo jiow idinq jydaw sh ta le!”

Hanyu Pinyin

Kěshì nèi jīzǐr yuè zhǎng yuè dà, yuè zhǎng yuè xiàng rényàngr: Ālìsī zǒudào le yī liǎng zhàng lù gēnqián, jiù kànchū tā yǒu yǎnjing bízi zuǐ lái le; tā zài zǒujìn yīdiǎnr jiù qīngqingchǔchǔ de kànchūlai tā jiùshì HŪNDÌ DŪNDÌ běnrén le. Tā duì zìjǐ shuō, “Zhè bù huì shì biéren le! Yòngbuzháo gěi tā xiě le mǎnliǎn de míngzi wǒ jiù yīdìng zhīdao shì tā le!”

English

However, the egg only got larger and larger, and more and more human: when she had come within a few yards of it, she saw that it had eyes and a nose and mouth; and when she had come close to it, she saw clearly that it was HUMPTY DUMPTY himself. “It can’t be anybody else!” she said to herself. “I’m as certain of it, as if his name were written all over his face.”

Chinese characters

可是那雞子兒越長越大,越長越象人樣兒:阿麗思走到了一兩丈路跟前,就看出它有眼睛鼻子嘴來了;她再走近一點兒就清清楚楚的看出來它就是昏弟敦弟本人了。她對自己說,“這不會是別人了!用不著給他寫了滿臉的名字我就一定知道是他了!”

Recordings of all of the books in the Sayable Chinese series are available on cassette. The recordings were made by Chao and his family. Unfortunately, I don’t have any of these — they’re expensive! — so I can’t supply a sound file for the section above. The Folkways recording of Chao’s Mandarin Primer, however, has a 30-second excerpt from from the Tweedledum and Tweedledee section. Here’s the English version of what’s being said:

photo of Y. R. Chao

Tweedledee: You like poetry?

Alice: Ye-es, pretty well — some poetry. Would you tell me which road leads out of the wood?

Tweedledee: What shall I repeat to her? “The Walrus and the Carpenter” is the longest.

The sun was shining—

Alice: If it’s very long, would you tell me first which road —

Tweedledee:

The sun was shining—

I think that’s Chao’s daughter, Rulan Chao Pian, as Alice, and Chao as Tweedledee.

Book Three contains Chao’s adaptation of The Mollusc, a 1908 stage comedy by H. H. Davies. (Alas, Project Gutenberg doesn’t have the text of this yet.) Interestingly, this play has an association with another romanization-related figure, Harold E. Palmer, who published The Principles of Romanization in 1930 and who was a leading figure in the field of English teaching. Palmer’s daughter Dorothee published an “annotated phonetic edition” (complete with tone marks) of The Mollusc in 1929. (Palmer had taught her how to read and write in phonetic notation, leaving her to pick up traditional spelling on her own!)

Pinyin Info also has the text of Y.R. Chao’s much-misunderstood stone lions story. Chao was making a point about Classical Chinese, not modern Mandarin. As the architect of a romanization system, Chao understood perfectly well that Mandarin is not doomed to a hell of homophony without Chinese characters.