Chinese characters, Pinyin, and computers

Recently added to my list of recommended readings: Characters and Computers, edited by Victor H. Mair and Yongquan Liu. Although this collection was published in 1991 and thus no longer represents the state of the art, the issues raised here remain relevant.

Of particular interest, at least where Pinyin is concerned, is the important essay Pinyin-to-Chinese Character Computer Conversion Systems and the Realization of Digraphia in China, by Yin Binyong, who has also written the books on Pinyin orthography: Chinese Romanization: Pronunciation and Orthography and the Xinhua Pinxie Cidian. The complete text of this substantial essay (nearly 6,000 words) is available here on Pinyin Info. I strongly encourage everyone to read this.

Here are the subject headings:

  1. The Three Stages in the Development of Pinyin-to-Chinese Character Computer Conversion Systems
  2. The Theoretical Contribution of the Pinyin-to-Chinese Character Conversion System to the Realization of Digraphia in China
  3. Practical Contributions of Pinyin-to-Chinese Character Conversion Systems to Digraphia in China
    1. Can alphabetized Chinese take the road of “pinyin pictophonetic characters”?
    2. What is an appropriate way to handle the representation of tones in a Pinyin-based writing system?
    3. How to solve the problem of homonyms in alphabetized (Pinyin) Chinese writing?
  4. Directions for the Future

Y.R. Chao and Humpty Dumpty

cover image of the book Sayable ChineseI’ve just added Y.R. Chao’s Sayable Chinese series to my list of recommended books. The second book in this series of three comprises Chao’s delightful translation of Lewis Carroll’s Through the Looking-Glass. I’ve selected part of the Humpty Dumpty chapter for the sample reading on Pinyin Info. Although the sample has romanization and English, the Sayable Chinese books have romanization and Chinese characters, presented en face. (I’ll add the Chinese characters one of these days, but they’re so much trouble to type! And scanning isn’t much of an improvement.)

These hardback books are a good deal at US$15 each.

The romanization method is Chao’s own Gwoyeu Romatzyh system.

Here’s a sample, with Hanyu Pinyin for comparative purposes:

Gwoyeu Romatzyh

Keesh neh jitzeel yueh jaang yueh dah, yueh jaang yueh shianq ren-yanql: Alihsy tzoou-dawle i-leang-janq luh gencheal, jiow kann.chu ta yeou yeanjing byitz tzoei lai le; ta tzay tzoou-jinn ideal jiow chingchingchuuchuu de kann.chulai ta jiowsh HUENDIH DUENDIH been-ren le. Ta duey tzyhjii shuo, “Jeh buhuey sh byeren le! Yonq.bu-jaur geei ta shieele maan-lean de mingtz woo jiow idinq jydaw sh ta le!”

Hanyu Pinyin

K?shì nèi j?z?r yuè zh?ng yuè dà, yuè zh?ng yuè xiàng rényàngr: ?lìs? z?udào le y? li?ng zhàng lù g?nqián, jiù kànch? t? y?u y?njing bízi zu? lái le; t? zài z?ujìn y?di?nr jiù q?ngqingch?ch? de kànch?lai t? jiùshì H?NDÌ D?NDÌ b?nrén le. T? duì zìj? shu?, “Zhè bù huì shì biéren le! Yòngbuzháo g?i t? xi? le m?nli?n de míngzi w? jiù y?dìng zh?dao shì t? le!”


However, the egg only got larger and larger, and more and more human: when she had come within a few yards of it, she saw that it had eyes and a nose and mouth; and when she had come close to it, she saw clearly that it was HUMPTY DUMPTY himself. “It can’t be anybody else!” she said to herself. “I’m as certain of it, as if his name were written all over his face.”

Chinese characters


Recordings of all of the books in the Sayable Chinese series are available on cassette. The recordings were made by Chao and his family. Unfortunately, I don’t have any of these — they’re expensive! — so I can’t supply a sound file for the section above. The Folkways recording of Chao’s Mandarin Primer, however, has a 30-second excerpt from from the Tweedledum and Tweedledee section. Here’s the English version of what’s being said:

photo of Y. R. Chao

Tweedledee: You like poetry?

Alice: Ye-es, pretty well — some poetry. Would you tell me which road leads out of the wood?

Tweedledee: What shall I repeat to her? “The Walrus and the Carpenter” is the longest.

The sun was shining—

Alice: If it’s very long, would you tell me first which road —


The sun was shining—

I think that’s Chao’s daughter, Rulan Chao Pian, as Alice, and Chao as Tweedledee.

Book Three contains Chao’s adaptation of The Mollusc, a 1908 stage comedy by H. H. Davies. (Alas, Project Gutenberg doesn’t have the text of this yet.) Interestingly, this play has an association with another romanization-related figure, Harold E. Palmer, who published The Principles of Romanization in 1930 and who was a leading figure in the field of English teaching. Palmer’s daughter Dorothee published an “annotated phonetic edition” (complete with tone marks) of The Mollusc in 1929. (Palmer had taught her how to read and write in phonetic notation, leaving her to pick up traditional spelling on her own!)

Pinyin Info also has the text of Y.R. Chao’s much-misunderstood stone lions story. Chao was making a point about Classical Chinese, not modern Mandarin. As the architect of a romanization system, Chao understood perfectly well that Mandarin is not doomed to a hell of homophony without Chinese characters.

Chinese New Year

The year 2006 has already begun here in Taiwan, but we have several more weeks to go before Chinese New Year, which will be on January 29 and which will be the beginning of a year of the dog.

I’ve dusted off and restyled some lists I made several years ago of all the dates of Chinese New Year between the years 1645 and 2644 — one thousand years in total. (See link above.) I think this is a nice resource, though it doesn’t have much to do with the normal concerns of this site.

One possible connection: It might be interesting to hear people’s views on the question of how to translate the names of some of the animals associated with the years in the Chinese calendar: “rat” or “mouse,” “ox” or “cow,” “goat” or “sheep,” and “rooster” or “chicken.” Of course that final example has another possible translation, but I still recall the difficulty of keeping a straight face the time one of my students in China pointed to a map of the country and told me, in all innocence, “China looks like a big cock.”

For lots more information on the Chinese lunisolar calendar, see Helmer Aslaksen’s Mathematics of the Chinese Calendar.

BTW, in case anyone is confused by my choice of ? (xīn) rather than the more common ? (also xīn) in “??” (xīnchūn —— I’ll finish in the morning

icons — please vote

For a long time I’ve had making a “favorites icon” (“favicon,” for short) on the long to-do list for this site. These icons are small images, just 16 pixels by 16 pixels, that can appear in bookmarks for a Web site and in the address bar. In some browsers, such as Opera, they also appear on the browser tabs, which is a nice touch.

Probably the most common look for icons is achieved by incorporating a letter of the alphabet: YahooYahoo's icon -- a red Y with an exclamation mark , Google Google's icon: a large blue capital G , Opera Opera Web browser's icon: a large red shadowed O, the New York Times New York Times's icon -- an ornate T , Forumosa Forumosa's icon -- an F .

Some icons use Chinese characters: Wenlin Wenlin's icon: 'Wenlin' in Chinese characters , No-Sword Chinese character 'wu2' (without, nothingness); icon for the No-Sword blog .

And some are more abstract or pictorial: Notetab text editor Notetab text editor's icon: a white cross against a red background , the Panda’s Thumb The Panda's Thumb icon -- a tiny image of a panda, Photo Net Photo Net's icon -- an image of a camera .

This being the sort of site it is, I’m not going to use a Chinese character — not unless I could fit romanization in as well. And I doubt that can be done within a 16 by 16 square.

Ideally, I’d like to have something in the style of Xu Bing‘s “new English calligraphy.” Here’s roughly the effect I’d be shooting for:
the word 'pinyin' written in the style of Chinese characters, after the method of artist Xu Bing

(That’s “P-I-n-Y-I-n”, in case you’re wondering.)

Unfortunately, however, that sort of thing doesn’t work very well when reduced down to icon size. About the best I could come up with is this: icon for Pinyin Info . But I’m not so sure about that.

I’d like to get input from my readers. Which of the following do you prefer?

  1. — largely the same as no. 1
  2. — the P is light green
  3. — the P is white
  4. — faux Xu Bing
  5. other (please specifiy)

Please let me know what you think with a comment here or through e-mail.

If you have an image you’d like to use for your site’s icon but don’t have the software to turn it into icon format, you could try this online favicon generator. It will reduce your image to the correct size and put it in .ico format.

Then place the resulting image, which should be named favicon.ico for maximum browser compatibility, in the root directory of your site. To make Internet Explorer happy, you could also add the following to the head of your HTML:
<link rel="shortcut icon" href="/favicon.ico" />

In other Pinyin Info image news, I’ve added a script to the Pinyin Info home page that will put up random images and links to readings on this site. I hope it helps let people know that there’s a lot more on this site than might appear at first glance.

Finally, since logos and icons are often associated with “ideographs,” this seems like a good place to recommend John DeFrancis’s reading on the ideographic myth, for anyone who hasn’t read that already.

Pinyin Info in the New York Times

Pinyin Info made the Reading File of this Sunday’s New York Times, with Victor H. Mair’s essay danger + opportunity ? crisis being quoted:

On, a Web site about the Chinese language, Victor H. Mair, a professor of Chinese at the University of Pennsylvania, explodes the myth that “crisis,” in Chinese means both “danger” and “opportunity.”

A whole industry of pundits and therapists has grown up around this one grossly inaccurate formulation. A casual search of the Web turns up more than a million references to this spurious proverb. It appears, … often complete with Chinese characters, on the covers of books, on advertisements for seminars, on expensive courses for “thinking outside of the box” and practically everywhere one turns in the world of quick-buck business, pop psychology, and orientalist hocus-pocus. …

Like most Mandarin words, that for “crisis” (weiji) consists of two syllables that are written with two separate characters, wei and ji. The ji of weiji, in fact, means something like “incipient moment; crucial point (when something begins or changes).” Thus, a weiji is indeed a genuine crisis, a dangerous moment, a time when things start to go awry. A weiji indicates a perilous situation when one should be especially wary. It is not a juncture when one goes looking for advantages and benefits. In a crisis, one wants above all to save one’s skin and neck!

source: By Any Other Name, New York Times, December 18, 2005

Pinyin Info in the news

Nathan Bierma‘s most recent column on linguistics for the Chicago Tribune‘s Tempo section contains excerpts from an e-mail interview with yours truly.

Much of the piece focuses on Professor Victor H. Mair’s explanation, here on Pinyin Info, of how “crisis” is not “danger” plus “opportunity” in Chinese characters .

The French have a saying about incomprehensible communication. Americans say, “It’s Greek to me.” But the French say “C’est du chinois” — meaning, “It’s Chinese.”

Chinese characters are so complex that they make a good metaphor for failure to communicate. But an American copy editor living in Taiwan is trying to demystify Chinese characters and demolish a few myths about how they work.

Mark Swofford runs the Web site, a site dedicated to Pinyin, the standard system of writing Chinese words in the Roman alphabet (the alphabet used to write English).

“Most of what most people think they know about Chinese — especially when it comes to Chinese characters — is wrong,” Swofford writes at the site. “This Web site is aimed at contributing to a better understanding of the Chinese languages and how Romanization can be used to write languages traditionally associated with Chinese characters (such as Japanese, Korean and especially Mandarin Chinese).”

The Mandarin Chinese word for “crisis,” for example, is represented with an intricate symbol made with several strokes, but the word’s pronunciation can be spelled in Pinyin as “weiji” (plus a few accent marks).

Using the Pinyin system makes it easier for students to learn to speak Chinese languages, Swofford says, because Chinese characters are so complex and misunderstood — such as the frequently misinterpreted character for “weiji,” a favorite of motivational writers and speakers.

Seeking a better system

Swofford says he started his Web site in part out of frustration with the confusing and inconsistent ways street names were written in the Roman alphabet when he moved to Taiwan.

“As a professional copy editor, I found the plethora of misspellings more than just a nuisance,” Swofford says. “I started compiling lists of street and place names so that I would be able to know the correct spellings.”

Swofford’s Pinyin site features news articles about Chinese writing, original essays about Pinyin, spelling quizzes, song lyrics written in Pinyin and sample chapters of books on Pinyin.

“The Mandarin Chinese language has about 410 distinct syllables, not counting variations based on tones,” Swofford writes by e-mail from Taiwan, where he is a copy editor at Kainan University. “All can be written simply and unambiguously using the Roman alphabet.”

Swofford lists all of the syllables written in Pinyin, alongside the characters they represent, at

“One needn’t be a student of Mandarin or a scholar to make use of the readings on my site,” Swofford says. “Most of the readings are in English and require no prior knowledge of anything about the Sinitic [Chinese] languages.”

Victor Mair is an avid reader and regular contributor to Mair is professor of Chinese language and literature at the University of Pennsylvania, where he teaches a course called “Language, Script and Society in China.”

Mair believes that Western teachers often overemphasize the need to learn and read Chinese characters. By learning Chinese with a Romanized alphabet instead of characters, he says, students are able to start speaking the language more quickly.

`Crisis’ clarified

Chinese characters themselves are often misunderstood, Mair says. Many students and scholars fail to realize there is a difference between Chinese characters and Chinese languages, he says, which can lead to problems because the meaning of the characters depends on the language and culture where they are used.

This confusion is partly to blame for the common claim of self-help books that the Chinese character for the word “crisis” means both “danger” and “opportunity.”

“A whole industry of pundits and therapists has grown up around this one grossly inaccurate formulation,” Mair writes at “The explication of the Chinese word for `crisis’ as made up of two components signifying `danger’ and `opportunity’ is due partly to wishful thinking, but mainly to a fundamental misunderstanding about how terms are formed in Mandarin and other Sinitic languages.”

According to the myth, to write the Chinese character for “crisis,” you combine the character for “danger” and the character for “opportunity.”

That’s based on a partial truth: the word pronounced “weiji” is made up of two characters, pronounced “wei” and “ji.” But while “wei” means danger, “ji” doesn’t mean “opportunity.”

“The `ji’ of `weiji,’ in fact, means something like `incipient moment; crucial point (when something begins or changes),’” Mair writes. “Thus, a `weiji’ is indeed a genuine crisis, a dangerous moment. . . . A `weiji’ in Chinese is every bit as fearsome as a crisis in English.”

The word “ji” only means “opportunity” in some cases, such as when it combines with the word “hui” (“occasion”) to make the word “jihui,” for “opportunity.” Its meaning changes depending on what other word it’s blending with. The crisis-means-opportunity myth, Mair says, is founded on a faulty understanding of the way languages work.

“There will always be some degree of misinterpretation about other peoples and their languages,” Mair writes by e-mail, “but I’m hoping to reduce misunderstanding through critical thinking and clear education.”

Here’s the article: Debunking misconceptions about Chinese characters. (Reading the piece, however, requires jumping through some registration hoops. Perhaps Bierma will later add it to his archive of some of his work, which contains much of interest.) It was published in the Chicago Tribune on November 9, 2005.