Critique of ordering of dictionaries for Mandarin Chinese

Sino-Platonic Papers has rereleased for free its very first issue, from February 1986: The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects (1.5 MB PDF), by Professor Victor H. Mair of the University of Pennsylvania’s Department of East Asian Languages and Civilizations.

This is an important essay that helped lead to the production of the ABC Chinese-English Comprehensive Dictionary, which is my favorite Mandarin-English dictionary.

Here is how it begins:

As a working Sinologist, each time I look up a word in my Webster’s or Kenkyusha‘s I experience a sharp pang of deprivation Having slaved over Chinese dictionaries arranged in every imaginable order (by K’ang-hsi radical, left-top radical, bottom-right radical, left-right split, total stroke count, shape of successive strokes, four-corner, three-corner, two-corner, kuei-hsieh, ts’ang-chieh, telegraphic code, rhyme tables, “phonetic” keys, and so on ad nauseam), I have become deeply envious of specialists in those languages, such as Japanese, Indonesian, Hindi, Persian, Russian, Turkish, Korean, Vietnamese, and so forth, which possess alphabetically arranged dictionaries. Even Zulu, Swahili, Akkadian (Assyrian), and now Sumerian have alphabetically ordered dictionaries for the convenience of scholars in these areas of research.

It is a source of continual regret and embarrassment that, in general, my colleagues in Chinese studies consult their dictionaries far less frequently than do those in other fields of area studies. But this is really not due to any glaring fault of their own and, in fact, they deserve more sympathy than censure. The difficulties are so enormous that very few students of Chinese are willing to undertake integral translations of texts, preferring instead to summarize, paraphrase, excerpt and render into their own language those passages which are relatively transparent Only individuals with exceptional determination, fortitude, and stamina are capable of returning again and again to the search for highly elusive characters in a welter of unfriendly lexicons. This may be one reason why Western Sinology lags so far behind Indology (where is our Böthlingk and Roth or Monier-Williams?), Greek studies (where is our Liddell and Scott?), Latin studies (Oxford Latin Dictionary), Arabic studies (Lane’s, disappointing in its arrangement by “roots” and its incompleteness but grand in its conception and scope), and other classical disciplines. Incredibly, many Chinese scholars with advanced degrees do not even know how to locate items in supposedly standard reference works or do so only with the greatest reluctance and deliberation. For those who do make the effort, the number of hours wasted in looking up words in Chinese dictionaries and other reference tools is absolutely staggering. What is most depressing about this profligacy, however, is that it is completely unnecessary. I propose, in this article, to show why.

First, a few definitions are required, What do I mean by an “alphabetically arranged dictionary”? I refer to a dictionary in which all words (tz’u) are interfiled strictly according to pronunciation. This may be referred to as a “single sort/tier/layer alphabetical” order or series. I most emphatically do not mean a dictionary arranged according to the sounds of initial single graphs (tzu), i.e. only the beginning syllables of whole words. With the latter type of arrangement, more than one sort is required to locate a given term. The head character must first be found and then a separate sort is required for the next character, and so on. Modern Chinese languages and dialects are as polysyllabic as the vast majority of other languages spoken in the world today (De Francis, 1984). In my estimation, there is no reason to go on treating them as variants of classical Chinese, which is an entirely different type of language. Having dabbled in all of them, I believe that the difference between classical Chinese and modern Chinese languages is at least as great as that between Latin and Italian, between classical Greek and modern Greek or between Sanskrit and Hindi. Yet no one confuses Italian with Latin, modern Greek with classical Greek, or Sanskrit with Hindi. As a matter of fact there are even several varieties of pre-modern Chinese just as with Greek (Homeric, Horatian, Demotic, Koine), Sanskrit (Vedic, Prakritic, Buddhist Hybrid), and Latin (Ciceronian, Low, Ecclesiastical, Medieval, New, etc.). If we can agree that there are fundamental structural differences between modern Chinese languages and classical Chinese, perhaps we can see the need for devising appropriately dissimilar dictionaries for their study.

One of the most salient distinctions between classical Chinese and Mandarin is the high degree of polysyllabicity of the latter vis-a-vis the former. There was indeed a certain percentage of truly polysyllabic words in classical Chinese, but these were largely loan- words from foreign languages, onomatopoeic borrowings from the spoken language, and dialectical expressions of restricted currency. Conversely, if one were to compile a list of the 60,000 most commonly used words and expressions in Mandarin, one would discover that more than 92% of these are polysyllabic. Given this configuration, it seems odd, if not perverse, that Chinese lexicographers should continue to insist on ordering their general purpose dictionaries according to the sounds or shapes of the first syllables of words alone.

Even in classical Chinese, the vast majority of lexical items that need to be looked up consist of more than one character. The number of entries in multiple character phrase books (e.g., P’ien-tzu lei-pien [approximately 110,000 entries in 240 chüan], P’ei-wen yün-fu [roughly 560,000 items in 212 chüan]) far exceeds those in the largest single character dictionaries (e.g., Chung-hua ta tzu-tien [48,000 graphs in four volumes], K’ang-hsi tzu-tien [49,030 graphs]). While syntactically and grammatically many of these multisyllabic entries may not be considered as discrete (i.e. bound) units, they still readily lend themselves to the principle of single-sort alphabetical searches. Furthermore, a large proportion of graphs in the exhaustive single character dictionaries were only used once in history or are variants and miswritten forms. Many of them are unpronounceable and the meanings of others are impossible to determine. In short, most of the graphs in such dictionaries are obscure and arcane. Well over two-thirds of the graphs in these comprehensive single character dictionaries would never be encountered in the entire lifetime of even the most assiduous Sinologist (unless, of course, he himself were a lexicographer). This is not to say that large single character dictionaries are unnecessary as a matter of record. It is, rather, only to point out that what bulk they do have is tremendously deceptive in terms of frequency of usage.

Strongly recommended.

reviews of books related to China and linguistics (2)

Sino-Platonic Papers has just released online its second compilation of book reviews. Here are the books discussed. (Note: The links below do not lead to the reviews but to other material. Use the link above.)

Invited Reviews

  • William A. Boltz, “The Typological Analysis of the Chinese Script.” A review article of John DeFrancis, Visible Speech, the Diverse Oneness of Writing Systems.
  • Paul Varley and Kumakura Isao, eds., Tea in Japan: Essays on the History of Chanoyu. Reviewed by William R. LaFleur .
  • Vladimir N. Basilov, ed., Nomads of Eurasia. Reviewed by David A. Utz.

Reviews by the Editor

  • “Philosophy and Language.” A review article of Françcois Jullien, Procès ou Création: Une introduction a la pensée des lettrés chinois.

Language and Linguistics

  • W. South Coblin, A Handbook of Eastern Han Sound Glosses.
  • Weldon South Coblin. A Sinologist’s Handlist of Sino-Tibetan Lexical Comparisons.
  • ZHOU Zhenhe and YOU Rujie. Fangyan yu Zhongguo Wenhua [Topolects and Chinese Culture].
  • CHOU Fa-kao. Papers in Chinese Linguistics and Epigraphy.
  • ZENG Zifan. Guangzhouhua Putonghua Duibi Qutan [Interesting Parallels between Cantonese and Mandarin].
  • Luciana Bressan. La Determinazione delle Norme Ortografiche del Pinyin.
  • JIANG Shaoyu and XU Changhua, tr. Zhongguoyu Lishi Wenfa [A Historical Grammar of Modern Chinese] by OTA Tatsuo.
  • McMahon, et al. Expository Writing in Chinese.
  • P. C. T’ung and D. E. Pollard. Colloquial Chinese.
  • Li Sijing, Hanyu “er” Yin Shih Yanjiu [Studies on the History of the “er” Sound in Sinitic].
  • Maurice Coyaud, Les langues dans le monde chinois.
  • Patricia Herbert and Anthony Milner, eds., South-East Asia: Languages and Literatures; A Select Guide.
  • Andrew Large, The Artificial Language Movement.
  • Wilhelm von Humboldt, On Language: The Diversity of Hunan Language-Structure and Its Influence on the Mental Development of Mankind.
  • Vitaly Shevoroshkin, ed., Reconstructing Languages and Cultures.
  • Jan Wind, et al., eds., Studies in Language Origins.

Short Notices

  • A. Kondratov, Sounds and Signs.
  • Jeremy Campbell, Grammatical Man: Information, Entropy, Language, and Life.
  • Pitfalls of the Tetragraphic Script.

Lexicography and Lexicology

  • MIN Jiaji, et al., comp., Hanyu Xinci Cidian [A Dictionary of New Sinitic Terms]
  • LYU Caizhen, et al., comp., Xiandai Hanyu Nanci Cidian [A Dictionary of Difficult Terms in Modern Sinitic].
  • Tom McArthur, Worlds of Reference: Lexicography, learning and language from the clay tablet to the computer.

A Bouquet of Pekingese Lexicons

  • JIN Shoushen, comp., Beijinghua Yuhui [Pekingese Vocabulary].
  • SONG Xiaocai and MA Xinhua, comp., Beijinghua Ciyu Lishi [Pekingese Expressions with Examples and Explanations] .
  • SONG Xiaocai and MA Xinhua, comp., Beijinghua Yuci Huishi [Pekingese Words and Phrases with Explanations] .
  • FU Min and GAO Aijun, comp., Beijinghua Ciyu (Dialectical Words and Phrases in Beijing).

A Bibliographical Trilogy

  • Paul Fu-mien Yang, comp., Chinese Linguistics: A Selected and Classified Bibliography.
  • Paul Fu-mien Yang, comp., Chinese Dialectology: A Selected and Classified Bibliography.
  • Paul Fu-mien Yang, comp., Chinese Lexicology and Lexicography: A Selected and Classified Bibliography.

Orality and Literacy

  • Jack Goody. The interface between the written and the oral.
  • Jack Goody. The logic of writing and the organization of society.
  • Deborah Tannen, ed., Spoken and Written Language: Exploring Orality and Literacy.

Society and Culture

  • Scott Simmie and Bob Nixon, Tiananmen Square.
  • Thomas H. C. Lee, Government Education and Examinations in Sung China.
  • ZHANG Zhishan, tr. and ed., Zhongguo zhi Xing [Record of a Journey to China].
  • LIN Wushu, Monijiao ji Qi Dongjian [Manichaeism and Its Eastward Expansion].
  • E. N. Anderson, The Food of China.
  • K. C. Chang, ed., Food in Chinese Culture: Anthropological and Historical Perspectives.
  • Jacques Gemet, China and the Christian Impact: A Conflict of Cultures.
  • D. E. Mungello, Curious Land: Jesuit Accommodation and the Origins of Sinology.

Short Notice

  • Roben Jastrow, The Enchanted Loom: Mind in the Universe.

In Memoriam
Chang-chen HSU
August 6, 1957 – June 27, 1989

  • Hsu Chang-chen, ed., and tr., Yin-tu hsien-tai hsiao-shuo hsüan [A Selection of Contemporary Indian Fiction].
  • Hsu Chang-chen, T’o-fu tzu-huiyen-chiu (Mastering TOEFL Vocabulary).
  • Hsu Chang-chen, Tsui-chung-yao-te i pai ke Ying-wen tzu-shou tzu-ken (100 English Prefixes and Word Roots).
  • Hsu Chang-chen, Fa-wen tzu-hui chieh-koufen-hsi — tzu-shou yü tzu-ken (Les préfixes et les racines de la langue française).
  • Hsu Chang-chen, comp. and tr., Hsi-yü yü Fo-chiao wen-shih lun-chi (Collection of Articles on Studies of Central Asia, India, and Buddhism).

This is SPP no. 14, from December 1989. The entire text is now online as a 7.3 MB PDF.

See my earlier post for the contents of the first SPP volume of reviews and a link to the full volume.

reviews of books related to China and linguistics

Sino-Platonic Papers has just released online its first compilation of book reviews. Here is a list of the books discussed. (Note: The links below do not lead to the reviews but to other material.)

Invited Reviews

  • J. Marshall Unger, The Fifth Generation Fallacy. Reviewed by Wm. C. Hannas
  • Rejoinder by J. Marshall Unger
  • Hashimoto Mantaro, Suzuki Takao, and Yamada Hisao. A Decision for the Chinese NationsToward the Future of Kanji (Kanji minzoku no ketsudanKanji no mirai ni mukete). Reviewed by Wm. C. Hannas
  • S. Robert Ramsey. The Languages of China. Reviewed by Wm. C. Hannas
  • James H. Cole, Shaohsing. Reviewed by Mark A. Allee
  • Henry Hung-Yeh Tiee, A Reference Grammar of Chinese Sentences. Reviewed by Jerome L. Packard

Reviews by the Editor

  • David Pollack, The Fracture of Meaning
  • Jerry Norman, Chinese
  • N. H. Leon, Character Indexes of Modern Chinese
  • Shiu-ying Hu, comp., An Enumeration of Chinese Materia Medico
  • Donald M. Ayers, English Words from Latin and Greek Elements
  • Chen Gang, comp., A Dictionary of Peking Colloquialisms (Beijing Fangyan Cidian)
  • Dominic Cheung, ed. and tr., The Isle Full of Noises
  • Jonathan Chaves, ed. and tr., The Columbia Book of Later Chinese Poetry
  • Philip R. Bilancia, Dictionary of Chinese Law and Government
  • Charles O. Hucker, A Dictionary of Official Titles in Imperial China
  • Robert K. Logan, The Alphabet Effect
  • Liu Zhengtan, Gao Mingkai, et al., comp., A Dictionary of Loan Words and Hybrid Words in Chinese (Hanyu Wailai Cidian)
  • The Mandarin Daily Dictionary of Loan Words (Guoyu Ribao Wailaiyu Cidian)
  • Shao Xiantu, Zhou Dingguo, et al., comp., A Dictionary of the Origins of Foreign Place Names (Waiguo Diming Yuyuan Cidian)
  • Tsung-tung Chang, Metaphysik, Erkenntnis und Praktische Philosophie um Chuang-Tzu
  • Irene Bloom, trans, ed., and intro., Knowledge Painfully Acquired: The K’un-chih chi of Lo Ch’in-shun
  • Research Institute for Language Pedagogy of the Peking College of Languages, comp., Frequency Dictionary of Words in Modern Chinese (Xiandai Hanyu Pinlyu Cidian)
  • Liu Yuan, chief compiler, Word List of Modern Mandarin (Xianhi Hanyu Cibiao)
  • The Editing Group of A New English-Chinese Dictionary, comp., A New English-Chinese Dictionary
  • BBC External Business and Development Group, Everyday Mandarin

This is SPP no. 8, from February 1988. The entire text is now online as a 4.2 MB PDF.

‘dialect’ and ‘Chinese’ from a linguistic point of view

Another back issue of Sino-Platonic Papers has been released, this one of particular relevance to the themes of this site: What Is a Chinese “Dialect/Topolect”? Reflections on Some Key Sino-English Linguistic Terms (1991), by Professor Victor H. Mair of the University of Pennsylvania’s Department of East Asian Languages and Civilizations.

Here is the abstract:

Words like fangyan, putonghua, Hanyu, Guoyu, and Zhongwen have been the source of considerable perplexity and dissension among students of Chinese language(s) in recent years. The controversies they engender are compounded enormously when attempts are made to render these terms into English and other Western languages. Unfortunate arguments have erupted, for example, over whether Taiwanese is a Chinese language or a Chinese dialect. In an attempt to bring some degree of clarity and harmony to the demonstrably international fields of Sino-Tibetan and Chinese linguistics, this article examines these and related terms from both historical and semantic perspectives. By being careful to understand precisely what these words have meant to whom and during which period of time, needlessly explosive situations may be defused and, an added benefit, perhaps the beginnings of a new classification scheme for Chinese language(s) may be achieved. As an initial step in the right direction, the author proposes the adoption of “topolect” as an exact, neutral translation of fangyan.

The entire text is now online as a 2.2 MB PDF: What Is a Chinese “Dialect/Topolect”? Reflections on Some Key Sino-English Linguistic Terms.

Strongly recommended.

Cris-atunity revisited

Benjamin Zimmer of Language Log has had a couple of recent posts on the crisis = danger + opportunity myth. First, in Stop him before he tropes again, he takes Al Gore to task for repeating the myth (again).

Then Zimmer posted his findings that the myth “was in use among Christian missionaries in China as early as 1938 and creeping into American public discourse by 1940.” (See Crisis = danger + opportunity: The plot thickens.) Nice work!

Meanwhile, Gary Feng of Shadow has voiced a dissenting position that “the urban myth has some kernel of truth in it.”

Orientalism and Chinese characters: the case of ‘busyness’

Professor Victor H. Mair has sent me another piece along the lines of his popular essay danger + opportunity ≠ crisis.

The new piece discusses a misinterpretation of the nature of the Chinese character for máng (”busy”).

Since the entire essay is just a few paragraphs long, I won’t excerpt from it here but simply encourage everyone to read the whole thing: busyness ≠ heart + killing.

For related examples of this fanciful approach to etymology that Mair exposes, see misunderstandings of biblical proportions. And for a detailed explanation of how Chinese characters really do function, see Chinese.

mother-bleeping X’s

Click to enlarge. Taiwanese movie poster for the Western film 'Severance' (斷頭氣). It contains the line '員工旅遊變生死遊戲 真他X的煩 Orz'

Language Log has had quite a few posts in recent months on the bleeping out of letters from obscenities. I’d like to add here an example of something bleeped out of a string of Chinese characters.

The other day I noticed an ad on the side of a bus for the forthcoming British slasher film Severance. (I didn’t get a good photo of this ad, so here I’m using an image of the poster for this movie.) In Mandarin this has the rather uninspired title of Duàntóu qì (斷頭氣: “Severed Head Qi“).

What really caught my eye, however, was the tag line in Chinese characters:

員工旅遊變生死遊戲 真他X的煩 Orz

This is interesting not just for the use of Orz, which is Net slang, but also for the bleeping out of the middle character of the obscenity tāmā de (他媽的, sometimes seen as “tamade“), rendering it 他X的. (Note too that a Roman letter rather than a Chinese character was used for this.)

There’s nothing obscene about the middle character by itself (媽). It’s used in writing words related to (“mother”). For that matter, there’s nothing in the least impolite about any of the characters by themselves or the individual morphemes they represent. The phrase as a whole literally means simply “his mother’s.” But as a whole the phrase works as something that youngsters would get into trouble for saying around their parents or elders and that would probably not be used on television (not without bleeping the subtitles, at least).

Lu Xun (Lǔ Xùn/鲁迅/魯迅) wrote a brief essay about the expression tama de. (For an English translation and notes of Lu Xun’s tama de essay, see Lu Xun on the Chinese “national swear”, an excellent post by Huichieh Loy of From a Singapore Angle.)

Back to the bleeping. As the results of Google searches show, 他媽的 and 他X的 are both common, though the original form is much more so.

  total of all domains within .cn domains within .tw domains
他X的 98,100 22,700 6,960
他媽的 1,910,000 173,000 903,000

Note that .cn (PRC) domains have 23.14% of the total 他X的s but only 9.06% of the total 他媽的s. This difference is probably a result of China’s Net nanny culture. On the other hand, specifically PRC domains still have a lot of 他媽的s. (Or rather 他妈的s, using the so-called simplified form of 媽.) Taiwan domains, however, have more than five times as many, which in the spirit of this post I should probably call a fucking lot of 他媽的s.

Out of curiousity I also ran searches for the other letters of the alphabet and found a spike for the 他M的. The letter M serves here as an abbreviation for the ma of tama de. Accordingly, it’s no surprise to see that 他ma的 is also found and that both 他M的 and 他ma的 are relatively rare in .tw domains (since people in Taiwan aren’t taught romanization).

  total of all domains within .cn domains within .tw domains
他M的 21,200 4,220 128
他ma的 12,400 2,620 168

To my surprise, I also came across a lesser spike for the use of the letter Y: 他Y的

  total of all domains within .cn domains within .tw domains
他Y的 8,450 1,520 14

The 他Y的s are mainly referring to a sadistic Flash game Pìpì chōu tā Y de (屁屁抽他Y的).

But it appears this isn’t really intended to be the letter Y from the Roman alphabet. Instead, Y appears to be used in place of zhuyin fuhao’s similar-looking ㄚ, which represents the sound that Hanyu Pinyin assigns to the Roman letter A. Thus, 他Y的 is not read “ta Y de” but more like “taaa de.” (See Some Things Chinese Characters Can’t Do-Be-Do-Be-Do.) Oddly enough, there are thousands of pages with 他Y的 (Roman letter Y) but just a handful with 他ㄚ的 (bopo mofo ㄚ). This may be from the relative ease of typing the letter Y instead of zhuyin’s ㄚ. Another odd result is that many of the 他ㄚ的s are within .cn domains but in traditional Chinese characters. [Later addition: See the comments for clarification on this.]

Since the subject of zhuyin fuhao came up, I made some additional searches:

  total of all domains within .cn domains within .tw domains
ㄊㄚㄇㄚㄉㄜ 0 0 0
他ㄇㄚ的 142 0 55
他ㄇ的 3,820 16 1,410
ㄊㄇㄉ 408 0 2

“TMD” is another extremely common way to indicate tama de. But too many unrelated results turn up in searches for me to give useful numbers for this.

OK, I’m finally finished with this tama de post.

Essay in Hanyu Pinyin

Although I have a few texts here on Pinyin Info written in Pinyin, most of them aren’t long and are usually conversions from texts written in Chinese characters. So it is with very great pleasure that I announce the Internet release of an extensive and important essay by Zhang Liqing (張立青,张立青) that was written in Pinyin originally: Hànzì Bù Tèbié Biǎoyì.

Here is the opening:

Dàduōshù huì Hànzì de rén rènwéi Hànzì shì biǎoyì wénzì. Jiù shì shuō Hànzì gēn biéde wénzì bù yīyàng, bùbì yīkào fāyīn huòzhě biéde yǔyán tiáojiàn; yī ge rén zhǐyào xuéhuì le hěn duō Hànzì, kànjian Hànzì xiě de dōngxi jiù zhīdao shì shénme yìsi.

Zhè dàduōshù rén yòu kàndào liǎng jiàn shìqing. Dì-yī, Hànzì zài Zhōngguó liánxù yòng le sānqiān duō nián, bìngqiě dào xiànzài hái zài yòng. Dì-èr, Hànzì zài Dōng-Yà jǐ ge guójiā liúchuán le hěn cháng yī duàn shíjiān. Yúshì, tāmen yǒu tuīxiǎng chū liǎng ge jiélùn. Yī ge shuō Hànzì chāoyuè shíjiān; lìngwài yī ge shuō Hànzì chāoyuè kōngjiān. Guībìng qǐlai jiù shì Hànzì biǎoyì, kěyǐ chāoyuè shí-kōng. Zuìhòu gèng jìnyībù, bǎ Hànyǔ yě lājìnlái, shuō Hànzì zuì shìhé Hànyǔ.

Shàngmiàn de kànfǎ hé jiélùn “gēn shēn dì gù”, dànshì bùxìng dōu hěn piànmiàn, bù fúhé zhēnzhèng qíngkuàng. Wèishénme ne? Hěn jiǎndān….

Nothing would make me happier than for Mandarin teachers the world over to distribute this work to their students, for it’s much more than an exercise in Pinyin; it’s an essay with important points to make about the nature of Chinese characters. (And, yes, O teachers of the world, the copyright terms do allow you to reprint this.)

This essay appeared originally in 1991, in the Sino-Platonic Papers release of Schriftfestschrift: Essays on Writing and Language in Honor of John DeFrancis on His Eightieth Birthday, so some of you may have seen it already. But the full Schriftfestschrift is a whopping 15 MB, while this essay is a more manageable 759 KB PDF.

This special release of this article is in honor of the seventieth birthday this month of Zhang, some of whose work appears here at Pinyin Info. So, after reading Hanzi bu tebie biaoyi, I recommend that you turn to her translations of Lü Shuxiang (first seen here on this site!) and Zhou Youguang:

Those readings are also available in the original Mandarin:

In addition to being a writer, educator, and translator, Zhang is an associate editor of the excellent ABC Chinese-English Comprehensive Dictionary, which is by far my favorite Mandarin-English dictionary.

Happy birthday, Liqing!