New database of cross-strait differences in Mandarin goes online

Last week, on the same day President Ma Ying-jeou accepted the resignation of a minister who made some drunken lewd remarks at a wěiyá (year-end office party), Ma was joking to the media about blow jobs.

Classy.

screenshot from a video of a news story on this

But it was all for a good cause, of course. You see, the Mandarin expression chuī lǎba, when not referring to the literal playing of a trumpet, is usually taken in Taiwan to refer to a blow job. But in China, Ma explained, chuī lǎba means the same thing as the idiom pāi mǎpì (pat/kiss the horse’s ass — i.e., flatter). And now that we have the handy-dandy Zhōnghuá Yǔwén Zhīshikù (Chinese Language Database), which Ma was announcing, we can look up how Mandarin differs in Taiwan and China, and thus not get tripped up by such misunderstandings. Or at least that’s supposed to be the idea.

The database, which is the result of cross-strait cooperation, can be accessed via two sites: one in Taiwan, the other in China.

It’s clear that a lot of money has been spent on this. For example, many entries are accompanied by well-documented, precise explanations by distinguished lexicographers. Ha! Just kidding! Many entries are really accompanied by videos — some two hundred of them — of cutesy puppets gabbing about cross-strait differences in Mandarin expressions. But if there’s a video in there of the panda in the skirt explaining to the sheep in the vest that a useful skill for getting ahead in Chinese society is chuī lǎba, I haven’t found it yet. Will NMA will take up the challenge?

Much of the site emphasizes not so much language as Chinese characters. For example, another expensively produced video feeds the ideographic myth by showing off obscure Hanzi, such as the one for chěng.

WARNING: The screenshot below links to a video that contains scenes with intense wawa-ing and thus may not be suitable for anyone who thinks it’s not really cute for grown women to try to sound like they’re only thwee-and-a-half years old.

cheng3

In a welcome bit of synchronicity, Victor Mair posted on Language Log earlier the same week on the unpredictability of Chinese character formation and pronunciation, briefly discussing just such patterns of duplication, triplication, etc.

Mair notes:

Most of these characters are of relatively low frequency and, except for a few of them, neither their meanings nor their pronunciations are known by persons of average literacy.

Many more such characters consisting or two, three, or four repetitions of the same character exist, and their sounds and meanings are in most cases equally or more opaque.

The Hanzi for chěng (which looks like 馬馬馬 run together as one character) in the video above is sufficiently obscure that it likely won’t be shown correctly in many browsers on most systems when written in real text: 𩧢. But never fear: It’s already in Unicode and so should be appearing one of these years in a massively bloated system font.

Further reinforcing the impression that the focus is on Chinese characters, Liú Zhàoxuán, who is the head of the association in charge of the project on the Taiwan side, equated traditional Chinese characters with Chinese culture itself and declared that getting the masses in China to recognize them is an important mission. (Liu really needs to read Lü Shuxiang’s “Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.”)

Then he went on about how Chinese characters are a great system because, supposedly, they have a one-to-one correspondence with language that other scripts cannot match and people can know what they mean by looking at them (!) and that they therefore have a high degree of artistic quality (gāodù de yìshùxìng). Basically, the person in charge of this project seems to have a bad case of the Like Wow syndrome, which is not a reassuring trait for someone in charge of producing a dictionary.

The same cooperation that built the Web sites led to a new book, Liǎng’àn Měirì Yī Cí (《兩岸每日一詞》 / Roughly: Cross-Strait Term-a-Day Book), which was also touted at the press conference.

The book contains Hanyu Pinyin, as well as zhuyin fuhao. But, alas, the book makes the Pinyin look ugly and fails completely at the first rule of Pinyin: use word parsing. (In the online images from the book, such as the one below, all of the words are se pa ra ted in to syl la bles.)

The Web site also has ugly Pinyin, with the CSS file for the Taiwan site calling for Pinyin to be shown in SimSun, which is one of the fonts it’s better not to use for Pinyin. But the word parsing on the Web site is at least not always wrong. Here are a few examples.

  • “跑神兒” is given as pǎoshénr (good).
  • And apostrophes appear to be used correctly: e.g., fàn’ān (販安), chūn’ān (春安), and fēi’ān (飛安).
  • But “第二春” is run together as “dìèrchūn” (no hyphen) rather than as shown correctly as dì-èr chūn.
  • And “一個頭兩個大” is given as yíɡe tóu liǎnɡɡe dà (for Taiwan) and yīɡe tóu liǎnɡɡe dà (for China). But ge is supposed to be written separately. (The variation of tone for yi is in this case useful.)

Still, my general impression from this is that we should not expect the forthcoming cross-strait dictionary to be very good.

Further reading:

Early instances of misunderstandings of biblical proportions

old-style Hanzi for 來From time to time I come across references by the credulous to the supposed biblical roots of some Chinese characters. I was surprised to learn, however, that that manner of interpretation has been around for many years.

In his 1902 book China and the Chinese, Herbert A. Giles (of Wade-Giles fame) pointed out the flaw he had seen in some earlier work.

Even the early Jesuit Fathers of the seventeenth and eighteenth centuries, to whom we owe so much for pioneer work in the domain of Sinology, were not without occasional lapses of the kind, due no doubt to a laudable if excessive zeal. Finding the character 船, which is the common word for “a ship,” as indicated by 舟, the earlier picture-character for “boat” seen on the left-hand side, one ingenious Father proceeded to analyse it as follows: —

舟 “ship,” 八 “eight,” 口 “mouth” = eight mouths on a ship—“the Ark.”

But the right-hand portion is merely the phonetic of the character; it was originally 铅 “lead,” which gave the sound required; then the indicator “boat” was substituted for “metal.”

So with the word 禁 “to prohibit.” Because it could be analysed into two 木木 “trees” and 示 “a divine proclamation,” an allusion was discovered therein to the two trees and the proclamation of the Garden of Eden; whereas again the proper analysis is into indicator and phonetic.

Nor is such misplaced ingenuity confined to the Roman Catholic Church. In 1892 a Protestant missionary published and circulated broadcast what he said was “evidence in favour of the Gospels,” being nothing less than a prophecy of Christ’s coming hidden in the Chinese character 來 “to come.” He pointed out that this was composed of “a cross,” with two 人人 ‘men,’ one on each side, and a ‘greater man’ 人 in the middle.

That analysis is all very well for the character as it stands now; but before the Christian era this same character was written and was a picture, not of men and of a cross, but of a sheaf of corn. It came to mean “come,” says the Chinese etymologist, “because corn comes from heaven.”

Even if all the character etymologies Giles cites are not necessarily in keeping with modern scholarship, his principles here are correct.

China and U.S. study-abroad programs

The top 10 destinations for U.S. students studying abroad were unchanged in the 2009–2010 school year compared to the year before. China remained in fifth place, with its numbers up only 1.7% over the previous year.

Number of U.S. students studying abroad, by destination and year

By far the largest gains of destinations in the top 25 were those by Israel (60.7% — up to 3,146 visiting students) and India (44.4% — up to 3,884). Though not in the top 25, Taiwan also experienced very strong growth at 42.4% (850 students) — far higher than any other country in East Asia.

In second place for growth in East Asia was Japan (6.6%), which will soon replace Costa Rica in the top 10 if trends continue.

For places of origin of international students studying in the United States, China was by far the leader, with 157,558 students, about 50% more than India’s 103,895 students in the States. Third and fourth places were held by South Korea and Canada, respectively. Taiwan was fifth with 24,818 students.

Source:

Previous posts on this subject:

It’s Poetry Time

Shì shíhou le

Shì shíhou le, Zhōngguórén! Shì shíhou le
Guǎngchǎng shì dàjiā de
Jiǎo shì zìjǐ de
Shì shíhou yòng jiǎo qù guǎngchǎng zuòchū xuǎnzé

Shì shíhou le, Zhōngguórén! Shì shíhou le
Gēqǔ shì dàjiā de
Hóu[lóng] shì zìjǐ de
Shì shíhou yòng hóu[lóng] chàngchū xīndǐ de gēqǔ

Shì shíhou le, Zhōngguórén! Shì shíhou le
Zhōngguó shì dàjiā de
Xuǎnzé shì zìjǐ de
Shì shíhou yòng zìjǐ xuǎnzé wèilái de Zhōngguó

—Zhū Yúfū (朱虞夫 / Zhu Yufu), recently arrested in China for just this poem

Thanks to VHM for finding the full text of this poem for me.

Not the same sound

Today’s New York Times exhibits one of my pet peeves. (Yes, I do seem to have a lot of those.)

This particular one is the practice of declaring that some Mandarin word or expression has “the same sound” as something else — even though it doesn’t. Claiming that the Mandarin words for death and four sound identical is a frequent example of this.

So today we have this:

Consider Tide detergent, Taizi, whose Chinese characters literally mean “gets rid of dirt.” (Characters are important: the same sound written differently could mean “too purple.”)

Nope. The Mandarin name for Tide detergent is Tàizì. On the other hand, “too purple” would be “tài zǐ,” which is close but not the same.

Tàizì ≠ tài zǐ

So, the answer to the question “When is a homophone not a homophone?” is “When it’s not a @#$%! homophone.”

But I will give the Times points for not mentioning wax tadpoles.

source: Picking Brand Names in China Is a Business Itself, New York Times, November 11, 2011

The where and why of missing second tones

image of 'zhong' written with 1st, 2nd, 3rd, and 4th tone -- with the 2nd-tone one in light gray instead of black textMy previous post mentioned that not all tonal permutations exist in the real world. For example, modern standard Mandarin has zhōng, zhǒng, and zhòng, but doesn’t have zhóng. I did not, however, get into any of the reasons for the absence of second-tone zhong.

Fortunately, my friend James E. Dew, who is much more qualified than I to discuss such fine points of linguistics, was kind enough to send in the explanation below. Jim used to teach the Chinese language and linguistics at the University of Michigan; and for many years he directed the Inter-University Program (a.k.a. the Stanford Center) in Taipei. He is also the author of 6000 Chinese Words: A Vocabulary Frequency Handbook and coauthor of Classical Chinese: A Functional Approach.

Most simply stated, Mandarin syllable shapes with unaspirated occlusive initials and nasal finals don’t occur in second tone. This can be restated a bit less opaquely for those who have not studied Chinese historical phonology, as follows:

Syllables that begin with unaspirated stops b, d, g, or affricates j, zh, z, and end in a nasal n or ng, as a rule don’t have second-tone forms. There are a few exceptions, such as béng ( / “needn’t”) and zán ( / “we”), which were new words formed by contraction — from búyòng and zámén, respectively — after the tone class split described below took place.

This came about because when Middle Chinese (of Sui-Tang times) píngshēng 平声/平聲 split into yīnpíng 阴平/陰平 (modern Mandarin “first tone”) and yángpíng 阳平/陽平 (M “second tone”), syllables with aspirated initials went into the new yángpíng class, while those with unaspirated initials all fell into the yīnpíng (M first tone) group, thus leaving no unaspirated syllables with nasal finals in the modern Mandarin second tone class.

An interesting corollary to this rule is that among Mandarin “open” syllables (those that end in a vowel) with the above-listed initials, almost all of the second-tone syllables derive from Middle Chinese rùshēng 入声/入聲, and their cognates have stop endings in the southern dialects that preserve rùshēng, as illustrated by the Cantonese examples given below.

For those who like to pronounce what they read, Cantonese rùshēng syllables have level tones, either high, mid or low. In the Yale romanization used here, high tone is marked with a macron (e.g., dāk), mid tone is unmarked, and low tone is signified by an h following the vowel. A double “aa” sounds like the “a” in “father,” while a single “a” is a mid central vowel. Thus baht sounds like English “but” and dāk sounds like English “duck.”
  Mandarin Cantonese
baht
bái baahk
báo bohk
別/别 bié biht
baak
bok
daap
dāk
敵/敌 dihk
duhk
gaak
閣/阁 gok
國/国 guó gwok
gāp
極/极 gihk
jaahp
夾/夹 jiá gaap
結/结 jié git
節/节 jié jit
gūk
覺/觉 jué gok
決/决 jué kyut
雜/杂 jaahp
澤/泽 jaahk
閘/闸 zhá jaahp
zhái jaahk
zhé jit
執/执 zhí jāp
zhí jihk
zhú jūk
濁/浊 zhuó juhk

Pinyin’s never-used letter?

As most people reading this blog know, Mandarin has about 1,300 syllables (interjections and loan words complicate the count a little). If tones — a basic part of the language — are disregarded, the number of drops to 400 and something syllables.

Given 410 or so basic syllables and 4 tones — one of these days I need to write something more on the wrongful neglect of the so-called neutral tone — some people might expect there to be more like 1,640 syllables instead of about 1,300. The reason for the lower number is that not all syllables exist in all four tones. For example, quite clearly the official language of Zhōngguó does not lack zhōng … or zhǒng or zhòng. But zhóng is another matter.

So not all possible tonal variations of those 400-something syllables appear in modern standard Mandarin. But what about letters?

If you look at the official alphabet for Hanyu Pinyin, it’s exactly the same as that for English (other than in pronunciation, of course), which is a bit odd, especially considering that Pinyin doesn’t use the letter v (or at least isn’t supposed to for Mandarin words).

So in this case, I’m excluding v but otherwise being expansionist about the glyphs I’m calling letters. To be specific: I’m referring to a-z, minus v, but including ā, á, ǎ, à, ē, é, ě, è, ī, í, ǐ, ì, ō, ó, ǒ, ò, ū, ú, ǔ, ù, ü, ǖ, ǘ, ǚ, and ǜ. (Even though Ī, Í, Ǐ, Ì, Ū, Ú, Ǔ, Ù, Ü, Ǖ, Ǘ, Ǚ, and Ǜ never come at the beginning of a word, let’s not automatically eliminate them, because there is an occasional need for ALL CAPS.)

Are there any of those possible glyphs that don’t appear at all — at least as given in the large ABC Comprehensive Chinese-English Dictionary?

The answer, perhaps surprisingly, is yes.

Which letter is it?

a. ǖ b. ǘ c. ǚ d. ǜ

Have you made your choice?

It doesn’t take much thought to eliminate C as the answer. “Nǚ” (woman) is one of those first-couple-of-Mandarin-lessons vocabulary terms. And the word for green (lǜsè) is hardly obscure either. It might be harder to think of a word with the letter ǘ; but there are some. Donkey () is probably the most common. So the answer is A: ǖ.

It’s important to note that the lack of ǖ is in appearance only. The sound ǖ occurs in plenty of Mandarin words; it’s just that Pinyin’s simplified orthography calls for writing “u” instead where ǖ follows j, q, x, or y.

But even though I didn’t find an example of ǖ, I’d encourage font designers not to scratch it from their list of must-have glyphs for Pinyin faces, especially since teachers will no doubt want to continue giving tone-pattern drills based on four tones for all vowels, regardless. Also, someone with a searchable edition of the Hanyu Da Cidian or maybe the new Oxford online edition is probably about to use the comments to point me to some obscure entry there….

How to handle ‘de’ and interjections in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) deals with how to write Mandarin’s various de‘s, mood particles, and interjections.

This reading is available in two versions:

I’ve already written about the principles in previous posts. For example, see