New database of cross-strait differences in Mandarin goes online

Last week, on the same day President Ma Ying-jeou accepted the resignation of a minister who made some drunken lewd remarks at a wěiyá (year-end office party), Ma was joking to the media about blow jobs.


screenshot from a video of a news story on this

But it was all for a good cause, of course. You see, the Mandarin expression chuī lǎba, when not referring to the literal playing of a trumpet, is usually taken in Taiwan to refer to a blow job. But in China, Ma explained, chuī lǎba means the same thing as the idiom pāi mǎpì (pat/kiss the horse’s ass — i.e., flatter). And now that we have the handy-dandy Zhōnghuá Yǔwén Zhīshikù (Chinese Language Database), which Ma was announcing, we can look up how Mandarin differs in Taiwan and China, and thus not get tripped up by such misunderstandings. Or at least that’s supposed to be the idea.

The database, which is the result of cross-strait cooperation, can be accessed via two sites: one in Taiwan, the other in China.

It’s clear that a lot of money has been spent on this. For example, many entries are accompanied by well-documented, precise explanations by distinguished lexicographers. Ha! Just kidding! Many entries are really accompanied by videos — some two hundred of them — of cutesy puppets gabbing about cross-strait differences in Mandarin expressions. But if there’s a video in there of the panda in the skirt explaining to the sheep in the vest that a useful skill for getting ahead in Chinese society is chuī lǎba, I haven’t found it yet. Will NMA will take up the challenge?

Much of the site emphasizes not so much language as Chinese characters. For example, another expensively produced video feeds the ideographic myth by showing off obscure Hanzi, such as the one for chěng.

WARNING: The screenshot below links to a video that contains scenes with intense wawa-ing and thus may not be suitable for anyone who thinks it’s not really cute for grown women to try to sound like they’re only thwee-and-a-half years old.


In a welcome bit of synchronicity, Victor Mair posted on Language Log earlier the same week on the unpredictability of Chinese character formation and pronunciation, briefly discussing just such patterns of duplication, triplication, etc.

Mair notes:

Most of these characters are of relatively low frequency and, except for a few of them, neither their meanings nor their pronunciations are known by persons of average literacy.

Many more such characters consisting or two, three, or four repetitions of the same character exist, and their sounds and meanings are in most cases equally or more opaque.

The Hanzi for chěng (which looks like 馬馬馬 run together as one character) in the video above is sufficiently obscure that it likely won’t be shown correctly in many browsers on most systems when written in real text: 𩧢. But never fear: It’s already in Unicode and so should be appearing one of these years in a massively bloated system font.

Further reinforcing the impression that the focus is on Chinese characters, Liú Zhàoxuán, who is the head of the association in charge of the project on the Taiwan side, equated traditional Chinese characters with Chinese culture itself and declared that getting the masses in China to recognize them is an important mission. (Liu really needs to read Lü Shuxiang’s “Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.”)

Then he went on about how Chinese characters are a great system because, supposedly, they have a one-to-one correspondence with language that other scripts cannot match and people can know what they mean by looking at them (!) and that they therefore have a high degree of artistic quality (gāodù de yìshùxìng). Basically, the person in charge of this project seems to have a bad case of the Like Wow syndrome, which is not a reassuring trait for someone in charge of producing a dictionary.

The same cooperation that built the Web sites led to a new book, Liǎng’àn Měirì Yī Cí (《兩岸每日一詞》 / Roughly: Cross-Strait Term-a-Day Book), which was also touted at the press conference.

The book contains Hanyu Pinyin, as well as zhuyin fuhao. But, alas, the book makes the Pinyin look ugly and fails completely at the first rule of Pinyin: use word parsing. (In the online images from the book, such as the one below, all of the words are se pa ra ted in to syl la bles.)

The Web site also has ugly Pinyin, with the CSS file for the Taiwan site calling for Pinyin to be shown in SimSun, which is one of the fonts it’s better not to use for Pinyin. But the word parsing on the Web site is at least not always wrong. Here are a few examples.

  • “跑神兒” is given as pǎoshénr (good).
  • And apostrophes appear to be used correctly: e.g., fàn’ān (販安), chūn’ān (春安), and fēi’ān (飛安).
  • But “第二春” is run together as “dìèrchūn” (no hyphen) rather than as shown correctly as dì-èr chūn.
  • And “一個頭兩個大” is given as yíɡe tóu liǎnɡɡe dà (for Taiwan) and yīɡe tóu liǎnɡɡe dà (for China). But ge is supposed to be written separately. (The variation of tone for yi is in this case useful.)

Still, my general impression from this is that we should not expect the forthcoming cross-strait dictionary to be very good.

Further reading:

Early instances of misunderstandings of biblical proportions

old-style Hanzi for ?From time to time I come across references by the credulous to the supposed biblical roots of some Chinese characters. I was surprised to learn, however, that that manner of interpretation has been around for many years.

In his 1902 book China and the Chinese, Herbert A. Giles (of Wade-Giles fame) pointed out the flaw he had seen in some earlier work.

Even the early Jesuit Fathers of the seventeenth and eighteenth centuries, to whom we owe so much for pioneer work in the domain of Sinology, were not without occasional lapses of the kind, due no doubt to a laudable if excessive zeal. Finding the character ?, which is the common word for “a ship,” as indicated by ?, the earlier picture-character for “boat” seen on the left-hand side, one ingenious Father proceeded to analyse it as follows: —

? “ship,” ? “eight,” ? “mouth” = eight mouths on a ship—“the Ark.”

But the right-hand portion is merely the phonetic of the character; it was originally ? “lead,” which gave the sound required; then the indicator “boat” was substituted for “metal.”

So with the word ? “to prohibit.” Because it could be analysed into two ?? “trees” and ? “a divine proclamation,” an allusion was discovered therein to the two trees and the proclamation of the Garden of Eden; whereas again the proper analysis is into indicator and phonetic.

Nor is such misplaced ingenuity confined to the Roman Catholic Church. In 1892 a Protestant missionary published and circulated broadcast what he said was “evidence in favour of the Gospels,” being nothing less than a prophecy of Christ’s coming hidden in the Chinese character ? “to come.” He pointed out that this was composed of “a cross,” with two ?? ‘men,’ one on each side, and a ‘greater man’ ? in the middle.

That analysis is all very well for the character as it stands now; but before the Christian era this same character was written and was a picture, not of men and of a cross, but of a sheaf of corn. It came to mean “come,” says the Chinese etymologist, “because corn comes from heaven.”

Even if all the character etymologies Giles cites are not necessarily in keeping with modern scholarship, his principles here are correct.

The where and why of missing second tones

image of 'zhong' written with 1st, 2nd, 3rd, and 4th tone -- with the 2nd-tone one in light gray instead of black textMy previous post mentioned that not all tonal permutations exist in the real world. For example, modern standard Mandarin has zh?ng, zh?ng, and zhòng, but doesn’t have zhóng. I did not, however, get into any of the reasons for the absence of second-tone zhong.

Fortunately, my friend James E. Dew, who is much more qualified than I to discuss such fine points of linguistics, was kind enough to send in the explanation below. Jim used to teach the Chinese language and linguistics at the University of Michigan; and for many years he directed the Inter-University Program (a.k.a. the Stanford Center) in Taipei. He is also the author of 6000 Chinese Words: A Vocabulary Frequency Handbook and coauthor of Classical Chinese: A Functional Approach.

Most simply stated, Mandarin syllable shapes with unaspirated occlusive initials and nasal finals don’t occur in second tone. This can be restated a bit less opaquely for those who have not studied Chinese historical phonology, as follows:

Syllables that begin with unaspirated stops b, d, g, or affricates j, zh, z, and end in a nasal n or ng, as a rule don’t have second-tone forms. There are a few exceptions, such as béng ( / “needn’t”) and zán ( / “we”), which were new words formed by contraction — from búyòng and zámén, respectively — after the tone class split described below took place.

This came about because when Middle Chinese (of Sui-Tang times) píngshēng 平声/?? split into yīnpíng 阴平/?? (modern Mandarin “first tone”) and yángpíng 阳平/?? (M “second tone”), syllables with aspirated initials went into the new yángpíng class, while those with unaspirated initials all fell into the yīnpíng (M first tone) group, thus leaving no unaspirated syllables with nasal finals in the modern Mandarin second tone class.

An interesting corollary to this rule is that among Mandarin “open” syllables (those that end in a vowel) with the above-listed initials, almost all of the second-tone syllables derive from Middle Chinese rùshēng 入声/??, and their cognates have stop endings in the southern dialects that preserve rùshēng, as illustrated by the Cantonese examples given below.

For those who like to pronounce what they read, Cantonese rùshēng syllables have level tones, either high, mid or low. In the Yale romanization used here, high tone is marked with a macron (e.g., dāk), mid tone is unmarked, and low tone is signified by an h following the vowel. A double “aa” sounds like the “a” in “father,” while a single “a” is a mid central vowel. Thus baht sounds like English “but” and dāk sounds like English “duck.”
  Mandarin Cantonese
bái baahk
báo bohk
別/别 bié biht
敵/敌 dihk
閣/阁 gok
國/国 guó gwok
極/极 gihk
夾/夹 jiá gaap
結/结 jié git
節/节 jié jit
覺/觉 jué gok
決/决 jué kyut
雜/杂 jaahp
澤/泽 jaahk
閘/闸 zhá jaahp
zhái jaahk
zhé jit
執/执 zhí jāp
zhí jihk
zhú jūk
濁/浊 zhuó juhk

Wenlin releases major upgrade (4.0)

Wenlin logoOne of my favorite programs, Wenlin (which bills itself as “software for learning Chinese”), has just released a major upgrade for both Mac and Windows versions. This doesn’t happen often; it has been three-and-a-half years since the most recent big change was issued (Wenlin 3.4) and heaven only knows how long since 3.0 came out. So, yes, this release has many substantial improvements.

One of the features nearest and dearest to my heart is that Wenlin 4.0 features greatly improved handling of Pinyin. I was among the field testers for the new version, so I’ve already spent a lot of time examining this feature. Here are a few important aspects of this:

  • Conversions from Chinese characters follow Hanyu Pinyin orthography much more closely than before. This is a major change for the better. (There’s still some room for improvement. But I don’t think we’ll have to wait years for this.)
  • In the past, using Wenlin to convert long texts in Chinese characters into Pinyin could be a real chore, with users having to examine example after example of Chinese characters with multiple pronunciations in order to select the proper pronunciation for that particular context. But now users may, if they so desire, tell Wenlin not to ask users for disambiguation input. Of course, that doesn’t mean that Wenlin will always guess right; but many users will be happy that this trade-off allows them to skip the frustration of, for example, having to tell the program over and over and over that, yes, in this case 說 is pronounced shuō rather than shuì.
  • Relative newcomers to Mandarin may appreciate that for common words tone sandhi is indicated in Wenlin with additional marks (a dot or line below the vowel). This feature can also be turned off, for those who want standard Pinyin.

There are, of course, many improvements beyond the area of Pinyin. Here are a few:

  • One limitation of Wenlin 3.x was that its English dictionary wasn’t very large. But Wenlin 4.0 includes not only the ABC Chinese-English Comprehensive Dictionary but also the excellent new ABC English-Chinese, Chinese-English Dictionary (now finally in stock in the printed version).
  • The flashcards are now set up to handle not just individual characters but polysyllabic words.
  • There’s full Unicode Unihan 6.0 support for more than 75,000 Chinese characters.
  • And for those who think 75,000 just isn’t enough, users can now access Wenlin’s CDL technology. Through this, users can create new, variant, and rare characters; moreover, these can be published and shared with other Wenlin users or CDL-friendly devices.
  • Seal script versions of more than 11,000 characters are provided.
  • Wenlin contains an e-edition of the Shuowen Jiezi (Shuōwén Jiězì / 說文解字 / 说文解字).
  • Coders will be interested to know that Wenlin appears to be headed toward becoming open-source.
  • Both Mandarin and English entries are marked with grade levels, which aids learners by indicating relative frequency of use. The levels for Mandarin words are based on the Hanyu Shuiping Kaoshi (Hànyǔ Shǔipíng Kǎoshì / 汉语水平考试 / 漢語水平考試 / HSK).

The full version (i.e., the CD with the program comes in a box and is likely packaged with a hard copy of the manual) is US$199, or US$179 if you download it from the Wenlin Web store. Upgrades from 3.x cost US$49.

For more information, see the summary of features and outline of what’s new in Wenlin 4.0.

screenshot from Wenlin 4.0 -- click for larger version

Le Grand Ricci now available on DVD

cover of le Grand Ricci numeriqueThe magnificent Grand dictionnaire Ricci de la langue chinoise, better known as le Grand Ricci, has just been released on DVD, almost a decade after its release in book form and exactly four hundred years after the death of Matteo Ricci.

The list price is 120 euros (about US$150), which is much cheaper than the printed edition. A long video in French (16:31) discusses the work. For those who would prefer something in English, a PDF gives background information on the dictionary project.

For a sample of the dictionary’s format and entries, see the 25 pages of entries for shan. Alas, as this example shows, the entries are not word parsed. But at least Hanyu Pinyin is now available for those who prefer it to Wade-Giles.

As long as I’m mentioning Ricci-related work, I might as well use the occasion to note that the Taipei Ricci Institute is putting its collection of books on permanent loan to Taiwan’s National Central Library.

Also, I’d like to note that parts of Matteo Ricci’s original dictionary can now be viewed through the Google Books scan of a publication from earlier this century of his Dicionário Português-Chinês.


image from a manuscript page of Ricci's original dictionary

books bought in Beijing

cover of a book by Zhou YouguangI didn’t have any luck finding anything in Sin Wenz (L?d?nghuà X?n Wénzì / ??????), despite trips to several large used book stores. (Fortunately, the Internet is now providing some leads. Thanks, Brendan and Joel!) But I did find some other books to bring home.

I acquired lots of books by Zhou Youguang, not all of which focus primarily on linguistics:

Other than the Zhou Youguang books, here are my favorite finds of the trip, as they are for the most part in correctly word-parsed Hanyu Pinyin (with Hanzi underneath), along with a few notes in English:

I’ll soon be posting more about the above books with Pinyin, so watch this site for updates. Really, this is gonna be good.

Although this collection of Y.R. Chao says it’s volume 15, it’s actually two books:

  • Zhào Yuánrèn quánjí, dì 15 juàn (??????15?)

Some more titles:

  • Measured Words: The Development of Objective Language Testing, by Bernard Spolsky
  • P?t?nghuà shu?píng cèshì shísh? g?ngyào (???????????). Now with the great smell of beer! Sorry, Brendan, I owe you one — more than one, actually.

The following I bought because Yin Binyong, the scholar primarily responsible for Hanyu Pinyin’s orthography, is the author of these titles from Sinolingua’s series of Bóg?t?ngj?n xué Hàny? cóngsh? (“Gems of the Chinese Language through the Ages” (their translation)), all of which are in Mandarin (Hanzi) and English, with Pinyin only for the sayings being illustrated:

cover of 'Chinese-English Dictionary of Polyphonic Characters' (?????????)cover of 'Putonghua shuiping ceshi shishi gangyao' (???????????)cover of 'Xinhua pinxie cidian'


And finally:

Of course I already have that one — more than one copy, in fact. But it’s always good to have more than one spare when it comes to one of the two most important books on Pinyin orthography. I really need to follow up on my requests to use excerpts from this book, as it is the only major title missing from my list of romanization-related books (though it’s in Mandarin only).

sign in a Beijing bookstore reading 'Education Theury' [sic]

Sino-Tibetan, Indo-European, and the word for ‘wheel’

The latest rerelease from Sino-Platonic Papers is “Sino-Tibetan *kolo ‘Wheel‘” (800 KB PDF), by Robert S. Bauer. Those of you who like historical linguistics should be sure to read this one.


That the horse-drawn chariot appeared suddenly in China in the Shang Dynasty (ca. 1500-1066 BC) has led some Western scholars to believe that it was not independently invented by the Chinese but was introduced there by Western invaders. This paper is based on the premise that there is a connection between the transmission of the horse-drawn chariot from the West into China and the origin of some words meaning “wheel” and “wheeled-vehicle” in Sino-Tibetan languages. In particular, the paper proposes that words for “wheel” in some northern Chinese dialects and Bodic (Tibetan) languages are ultimately derived from an Indo-European source. On the basis of the comparison of words for “wheel” from various Sinitic and Bodic languages, the author has reconstructed the Proto-Sino-Tibetan root *kolo “wheel” which is itself an Indo-European contact loanword.

This was first published in August 1994 as issue no. 47 of Sino-Platonic Papers.

The Art of War: a companion volume

Sonshi, the largest website dedicated to Sun Zi’s (Sun Tzu’s) Art of War, recently selected Victor H. Mair’s new translation as “the #1 Art of War edition.”

In announcing its judgment, the site stated, “how rare a book that courageously stands up to centuries of established thought, proceeds to knock it down with sound logic and proof, and succeeds in convincing even the Old Guard to change their views.”

Professor Mair has just published a free, book-length companion to his translation: Soldierly Methods: Vade Mecum for an Iconoclastic Translation of Sun Zi bingfa, with a complete transcription and word-for-word glosses of the Manchu translation by H. T. Toh (1 MB PDF).

Yes, all that and Manchu too. The appendixes might well supply the longest text in romanized Manchu available online — not to mention the longest one with English translation. (Perhaps someone from Echoes of Manchu can comment.)

And I’d like to note the introduction to the transcription offers a cool word I hadn’t come across before: Mandjurist, which is German for “Manchu philologist.”

Here’s the table of contents:

  • Preface
  • Principles of Translation
  • Guide to Pronunciation
  • Key Terms
  • Abbreviations
  • Discussion
    • The Book and Its Title
    • Authorship
    • Historical Background
    • Dating
    • Stylistics and Statistics
    • Techniques and Technology
    • Taoistic Aspects
    • Eurasian Parallels
    • On the World Stage
    • Notes
  • Appendix I: The Pseudo-Biography of Sun Wu
  • Appendix II: Further Notes on Selected Key Terms
  • Appendix III: Transcription of the Manchu Translation of the Sun Zi with Word-for-Word English Glosses by Hoong Teik Toh
  • Appendix IV: Transcription of the Manchu Translation of the Sun Zi by Hoong Teik Toh
  • Bibliography

This is issue no. 178 of Sino-Platonic Papers.