Chinese literacy

I remain amazed by how many people are willing to take China’s official statistics at face value. Yet news story after news story refers to China’s supposed high literacy rate.

If you know any Chinese characters, try to see how many of the following items you can pronounce. (But even if you don’t know any Chinese characters, please keep reading.) The pronunciation needn’t be in Mandarin if you speak another Sinitic language. Moreover, if you aren’t sure how to pronounce some characters but know the meaning of the word nonetheless, give yourself full credit for that item anyway. Characters following a slash are, of course, “simplified” forms.

  1. 一萬 / 一万
  2. 姓名
  3. 糧食 / 粮食
  4. 函數 / 函数
  5. 肆虐
  6. 雕琢
  7. 彳亍
  8. 舛謬 / 舛谬
  9. 耆耄
  10. 饕餮

Scroll down for the answers and more information.

For reference, I have added the frequency of the characters used. Once past the 3,000 or so most frequently used characters, however, figures for frequency of use are difficult to come by and relatively unreliable because these characters are relatively infrequent. Of course, this doesn’t mean these can be ignored completely, because they do still occur and, at present, Chinese orthography doesn’t allow for the insertion of Hanyu Pinyin into a string of characters the way furigana or other non-kanji scripts can be used in Japanese.

If your score fell short of 10, perhaps you’d like to know that the median for PRC university graduates was 6.

Characters Pinyin English % not responding
correctly
frequency of 1st character frequency of 2nd character
一萬 yīwàn ten thousand 19.1 2 209
姓名 xìngmíng full name 22.3 1,025 137
糧食 liángshi grain; cereals; food 23.6 1,086 527
函數 hánshù function (math) 50.9 2,236 229
肆虐 sìnüè ravage; devastate; be rampant 65.8 2,460 c. 3,000
雕琢 diāozhuó cut and polish (jade/etc.); carve; write in an ornate style 62.0 1,919 2,511
彳亍 chìchù walk slowly 98.6 X X
舛謬 chuǎnmiù error; mishap 98.3 X 2,560
耆耄 qímào octogenarian 98.3 X X
饕餮 tāotiè a mythical ferocious animal; fierce and cruel person; a glutton; sb. of insatiable cupidity 99.4 X X

These were used in a test of literacy in the PRC that was part of a 1996 “stratified national probability sample” of some 6,000 adults ages 20-69. Care was taken in the selection of those interviewed, so that “for all practical purposes, we have representative national samples of China’s rural and urban populations,” according to Donald J. Treiman, who gives the results of this study in The Growth and Determinants of Literacy in China. For more on this study, which was a monumental undertaking, see Treiman’s Life Histories and Social Change in Contemporary China: Provisional Codebook. UCLA has made available much of the data for the study.

The selection of words, alas, was not particularly good, especially the choice of so many items from Literary Sinitic (classical Chinese). At least three of the final four words should have been tossed out in favor of more examples within the 2,000 or 3,000 most commonly used characters. Nevertheless, the data can be used to provide hints of the true extent of illiteracy in China.

In 1996 China’s adult literacy rate (15+) was about 85 percent, according to Beijing. (The age range for literacy in China is not always clear. Sometimes it refers to all adults. Sometimes it doesn’t include the elderly, whose rate of illiteracy is much higher than those born more recently. Sometimes it excludes everyone born before the founding of the PRC on October 1, 1949. And sometimes other limits are used.) The threshold for literacy was recognition of 1,500 characters for a rural inhabitant, and 2,000 characters for a “worker or staff member employed by an enterprise or institution or any urban resident.” (One country, two literacy thresholds?) (As most students of Mandarin could note, 1,500 characters isn’t going to provide anything resembling full literacy. Far too many characters in texts will be unknown, and far too little of a native speaker’s vocabulary will be unwritable with just 1,500 characters — in stark contrast to literacy through Pinyin, which would be easier to obtain and far more complete.) Moreover, literates were to be able to “read popular magazines and essays, to keep simple accounts, and to write simple essays.” Yet at the same time some one-fifth of China’s adult population could not recognize even such common and simple words as yiwan as written in extremely common and relatively simple Chinese characters (一万). Moreover, the characters in 姓名 (xìngmíng) and 糧食 (liángshi) are also well within the 1,500 most frequently used characters and should thus be known by all literate Chinese. The cumulative figure for those unable to identify all the characters given within the 1,500 minimum (for rural inhabitants) is 24 percent (see table below). That speaks of a literacy rate no greater than 76 percent, which is considerably less than the 85 percent the government was claiming.

Number of Correct Responses Percentage Cumulative Percentage
0 19.0 19.0
1 3.0 22.0
2 2.0 24.0
3 23.2 47.2
4 12.6 59.8
5 11.7 71.5
6 25.1 96.6
7 2.2 98.8
8 .7 99.5
9 .4 99.9
10 .1 100.0

A couple more factors need to be considered. First, Treiman’s study took roughly equal samples from China’s rural and urban populations (3,087 urban residents and 3,003 rural residents). But in 1996 about 75 percent of China’s population lived in rural areas, where literacy tends to be significantly lower than in the cities:

Relative to those who at age 14 had rural hukou status and resided in a village, those with urban hukou status residing in cities would be expected, on average, to be eight percentile points higher on the literacy scale. That is, the difference between the two extreme residential circumstances for otherwise similar people is the equivalent of about 1.6 years of schooling. (Treiman, p. 9)

Thus, because the relatively literate urban population is overrepresented, the literacy figure needs to be adjusted down from the 76 percent given earlier. (Sorry, I’m not much good at the math of adjusting sampling rates, so I’ll give a rough figure.) So now it’s at, say, 72 percent, which would give an illiteracy rate about twice as high as China was claiming (and which I think still underestimates the difference between “literacy” in the cities and the countryside). But the picture is still more bleak.

Another factor that cannot be overlooked is that real literacy, even by China’s own limited definition, requires the ability to write, not just read. Remembering how to write Chinese characters accurately, however, is much more difficult than the already difficult task of being able to recognize at least 1,500 of them passively. With this in mind, even doubling the illiteracy rate would not be extreme, I believe. This would yield an actual literacy rate below 50 percent.

Although this method leaves much to be desired, I believe its results better represent reality than official figures.

Literacy has been measured in China primarily according to the quantity of characters recognized (known) by an individual, normally 1,500 characters for rural dwellers and 2,000 characters for urban residents and rural leaders. These measures are not verified directly during a national census. Rather, survey teams note educational attainment and check illiteracy-eradication certificates. County level education departments or work units (danwei) are responsible for assessing through surveys or tests the literacy of and awarding literacy certificates to individuals who have not completed the fourth grade of six-year primary school, the third grade of five-year primary school, or an intensive primary school. — China Country Study, n. 5

Thus, the completion of as little as three years of primary school is enough to get someone listed automatically as literate, regardless of their actual literacy. Although that might be good enough to serve as a measure of basic literacy in a language that uses an alphabet, it isn’t when dealing with Chinese characters, which not only take many years to learn but also require a great deal of reinforcement through practice lest the learner lapse back into illiteracy. Other people are listed as being literate based on possession of an illiteracy-eradication certificate. These certificates, however, are awarded by authorities at the county level or at a person’s danwei; inflation of figures at the local or danwei levels, however, is common; the reasons for this can be summed up as “Individuals worry about punishment, officials worry about performance assessment, and enterprises worry about additional charges.”

(For an excellent look at how state planning and the use of statistics tend to become perverted under certain systems, see Dictatorship, State Planning, and Social Theory in the German Democratic Republic, by Peter C. Caldwell of Rice University. No, this doesn’t have anything to do with literacy or China, but many aspects of socialist planning in the former East Germany were the same as in the PRC.)

Before I close this unusually long post, I’d like to return for a moment to the characters in the literacy quiz. Note the approximate number of strokes in the various characters. Having only a few strokes doesn’t necessarily make a Chinese character “easy” to know. 彳and 亍 have but three stokes each, while 糧 and 食 have a total of 27. Yet more than 50 times as many people could identify the latter pair than the former one. The so-called simplification of Chinese characters did not, and could not, make Chinese characters simple to know or use.

A few words on the China Country Study cited above. This uses official (i.e., inflated and otherwise inaccurate) figures from the PRC. But it covers a wide enough range to be quite useful. It also has a very good bibliography of English sources. But all those pages about literacy — this is a long report — and not even a mention of how damn much trouble Chinese characters are. And essentially nothing about pinyin, either. Very strange.

Here’s its table of contents:

  1. Introduction: A Snapshot of Literacy and Illiteracy in China
  2. Literacy and Illiteracy in the Chinese Context: Historical and Contemporary Patterns of Literacy Provision
    • A Chronology of Literacy Policy, Definitions and Practice: 1905-2005
      • 1905-1949: Literacy for Saving, Securing, and Strengthening China
      • 1949-1976: Language Reform, Literacy for Collectivization and Production, and the Unequal Expansion of Schooling
      • 1978-1988: Literacy and the Modernization Decade: “Blocking, Eradicating, and Raising”
      • 1988-2005: Literacy for and Assimilation of the Margins
  3. Minority Nationalities, Languages, and Literacy
  4. Remaining Barriers to Literacy for All
  5. Trends in Literacy and Illiteracy Across Regional and Rural-Urban Divides and Across Gender, Ethnicity, Income, and Disability
    • Literacy and Gender
    • Literacy and national minority populations
    • Literacy and disabled populations
    • A Rough Check on the Taken-for-Granted Mathematics of Chinese Literacy
  6. Conclusion: Future Outlook and Challenges for Literacy in China
  7. Bibliography

If you’d like references other than in the study above, Barend ter Haar has compiled an annotated bibliography on literacy, writing and education in Chinese culture.

And, finally, John DeFrancis has some important things to say on this topic in The Chinese Language: Fact and Fantasy, especially the chapter “The Successfulness Myth.”

Latin and Greek in U.S. schools

Mark Liberman at Language Log mentions the role Latin and Greek used to play in education (Old school), which is as good an excuse as any to post some graphs I made a few weeks ago from data (not current, alas) on Latin and Greek Enrollments in America’s Schools and Colleges. (I’m trying to post my backlog before Chinese New Year. And maybe then I can finally answer all the mail and comments that have been piling up.)

Note that these are not to scale with each other.

Latin as a Percentage of Enrollments, Grades 9-12
percentage of U.S. high school students enrolled in Latin courses, by year, showing a steep decline from the mid 1960s to mid 1970s

Latin, as a Percentage of College Enrollments
percentage of U.S. college students enrolled in Latin courses, by year, showing a steep decline from the late 1960s to mid 1970s

Greek, as a Percentage of College Enrollments
percentage of U.S. college students enrolled in Greek courses, by year

The numbers appear different in another paper from the same source: Foreign Language Enrollments in United States Institutions of Higher Education, Fall 2002 (PDF). This also gives data for many other languages.

I was going to have this lead into a discussion on the role of Classical Chinese in education in Taiwan, but I’m too far behind. So this makes two entries in a row without a direct tie-in to this site’s theme. Sorry about that.

Microsoft, Dzongkha, and “dialects”

Dzongkha, the national language of Bhutan, has been relegated to the status of a dialect of Tibetan in Microsoft products. Rather than being labelled “Dzongkha” or “Bhutan-Dzongkha,” it is identified as “Tibetan – Bhutan” in the recently released beta version of Windows Vista. This is apparently an official Microsoft policy, likely aimed at appeasing China.

Microsoft has barred the use of the Bhutanese government’s official term for the Bhutanese language, Dzongkha, in any of its products, citing that the term had affiliations with the Dalai Lama. In an internal memorandum, Microsoft employees were told not to use the term Dzongkha in any Microsoft software, language lists or promotional materials since “Doing so implies affiliation with the Dalai Lama, which is not acceptable to the government of China. In this instance, replace “Dzongkha” with ‘Tibetan – Bhutan’.”

The Kingdom of Bhutan is situated in the Himalayas between India and Tibet. The state religion is the Drukpa Kagyu school of Tibetan Buddhism and Dzongkha is the official language. Dzongkha has a linguistic relationship to modern Tibetan in a similar way to that between Spanish and Italian.

The use of the word Dzongkha was graded by Microsoft as a ‘ship-stopper’, which means that a product may not be produced in any form until the problem is resolved. Microsoft has four levels of error severity, ship-stopper being the most severe.

Likely uses of the term may have been in Language Lists for Microsoft products, particularly the upcoming release of the next version of the Microsoft Windows operating system, Windows Vista. (Source: Microsoft Sensitive to Chinese Pressure on Bhutan Tibet Link, Tibet News. )

I didn’t know anything about Dzongkha, so I did some searching and found this:

Dzongkha is the modern Bhutanese vernacular language derived from Old Tibetan through many centuries of separate evolution on Bhutanese soil. Modern Dzongkha differs from Classical Tibetan as much as modern French does from Classical Latin. Only a few decades ago, the first attempts were undertaken to write in the vernacular in Bhutan, and the strong liturgical tradition in Bhutan has maintained the use of Classical Tibetan as the literary language to the present day. (source)

If this is accurate, the situation sounds familiar: A literary language (Classical Chinese in China, Classical Tibetan in Bhutan, Latin in Europe) continued to be used long after it was no longer spoken by the masses because over time the language had evolved in different ways in different places, becoming new languages (Mandarin, Cantonese, Hakka, etc., in China; Dzongkha and Tibetan in Bhutan and Tibet; French, Spanish, Italian, etc. in Europe). But because people in different locales primarily used the same literary language rather than writing in their own [modern] languages, their mutually unintelligible languages were mislabeled “dialects.”

But even if everyone in Europe were to switch to writing in Latin or even Italian, that wouldn’t make French, Spanish, Portuguese, etc., “dialects.” Similarly, the use of Modern Standard Mandarin in China as the written language doesn’t mean that Mandarin, Cantonese, Hakka, Taiwanese, etc., aren’t all separate languages.

And, lest I pass over the issue of romanization, Dzongkha is written in the Tibetan script and also has an official romanization system, “Roman Dzongkha,” which makes use of all the letters of the Roman alphabet other than F, V, Q, and X. Its three diacritic marks are the apostrophe, the circumflex accent, and the diaeresis. Bhutan, however, is not expected to replace Bhutanese orthography with Roman Dzongkha.

And for Suzanne, here’s a Dzongkha keyboard.

additional source: Dzongkha: out of Windows?, Kuensel, Monday, September 26, 2005.

writing Taiwanese: language, script, and myths

I’ve been fortunate to be able to add to this site a major essay on Taiwan’s language situation, etymology, and scripts: “How to Forget Your Mother Tongue and Remember Your National Language,” by Victor H. Mair, a professor of Chinese language and literature at the University of Pennsylvania.

Here is the abstract:

The concept of guoyu (“national language”) is deeply embedded in the consciousness of everyone who has grown up in Taiwan during the past half century. Lately, however, people have begun to speak of their muyu (“mother tongue”) as being worthy of inculcation. Guoyu, of course, refers to Modern Standard Mandarin (MSM), which in China is called putonghua (“common speech”). Mandarin is not native to Taiwan, yet it is the national language of Taiwan’s citizens and is the sole official written language. In contrast, the citizens of Taiwan are discouraged from writing their native languages (viz., Taiwanese, Hakka, and various aboriginal languages) and it is only recently that it has been possible to teach them in the schools. This paper will examine the complicated processes whereby the citizens of Taiwan are transformed from speakers of their mother tongues to speakers and writers of the national language. This transformation does not rely purely on educational activities carried out in the schools, but involves political, social, and cultural factors as well. The transformation of Cantonese and Shanghainese speakers into Mandarin speakers and writers will also be examined for comparative purposes.

This, however, hardly does justice to the scope of the essay.

I strongly recommend reading this. Again, here is the link to the full essay.

Mystery of old simplified Chinese characters?

Archeologists working off the coast of Pingtan County, Fujian, have discovered a pottery-laden boat they believe dates back to the reign of the Kangxi emperor (1662-1723).

One small plate decorated with plum blossoms especially caught the attention of the researchers. On its underside is inscribed the words Shuang Long, or “double dragons”, in simplified Chinese characters. As simplified Chinese characters were adopted in printing and writing only after 1949 and the two simplified Chinese were unlikely to be any discernible pattern, experts regard this as a mystery. They can only be sure of the fact that the plate was produced more than 300 years ago during the reign of Emperor Kangxi.

In other words, “double dragons” was written 双龙 rather than the expected 雙龍.

But the use of 双 for what is pronounced shuāng in modern standard Mandarin has been around for hundreds of years. I suspect the same is true of 龙, though I lack the reference material to check this. (Someone help me out here.)

What really interests me here, though, isn’t the specifics about the dates of the forms 双 and 龙. Rather, it is the assertion that “simplified Chinese characters were adopted in … writing only after 1949,” which is incorrect. When developing the various schemes of officially sanctioned “simplified” Chinese characters, China’s script reformers took a variety of approaches. But they preferred to give sanction to forms that had already been in use for many, many years — though these forms may not have been standardized in print. Often they were used in calligraphy and, more simply, in handwritten documents.

I sometimes see assertions that people in Taiwan often use simplified characters when they write by hand. Such claims are misleading. Generally speaking, if people in Taiwan ever use “simplified” Chinese characters, they do so by continuing a centuries-old tradition, not by copying forms now standardized in China.

For example, if a person in Taiwan writes (by hand) instead of , this is simply because the use of for has been common in handwriting for ages. But if the character is printed, people in Taiwan will select the traditional style: . Quite simply, people in Taiwan aren’t moving toward using China’s simplified characters.

And, as long as I’m on the subject, I don’t think they should, either.

source: Ancient porcelain clue to maritime Silk Road (Xinhua’s “China View,” Sept. 23, 2005)

Peter Boodberg and the ideographic myth

I’ve been intrigued by Peter Boodberg since reading John DeFrancis’s account of the Creel-Boodberg debate. But only recently did I finally shell out the US$80 or so it currently costs to pick up a used copy of Boodberg’s selected works (compiled by Alvin P. Cohen).

But after receiving my book and doing some Web searches in preparation for this Pinyin News entry, I discovered that some of Boodberg’s works are available online (at least to some).

Jstor, an important online archive of scholarly journals, has all but the most recent editions of the Harvard Journal of Asiatic Studies, in which Boodberg published (between 1936 and 1957) several of his all-too-few works. While many do not have access to Jstor’s files, most of the sort of people who would be interested in reading titles like “Some Proleptical Remarks on the Evolution of Archaic Chinese” probably do — or at least know someone who does. (Try asking people at universities.) If you’re not sure if you have Jstor access or not, try any of the links in the list below.

Some works by Peter A. Boodberg available online:

Some of Boodberg’s closely argued points don’t make for easy reading, but his style should not be mistaken for dry, because he can be suprisingly direct. For example, have a look at how he introduces his refutations of some of Creel’s more naive points:

[A]s a philologist and teacher of Chinese, I am naturally perturbed by — and cannot remain indifferent to — the rise of a methodology which produces, not in comparatively innocuous special articles, but in text-books through which a new generation of sinologists is expected to be trained, puerilities such as the following….

Yeah! Alas, such puerilities still abound today, 65 years after he made those remarks.

I had wanted to post a link to the In Memorium on Boodberg by Y.R. Chao and others, but, oddly, the original page seems to have disappeared. It would be a shame if this were lost, so I’m posting a copy of the Google cache of the above page before even that is gone.

====================================

Peter Alexis Boodberg, Oriental Languages: Berkeley
1903-1972
Agassiz Professor of Oriental Languages and Literature, Emeritus

Peter Boodberg spent his boyhood in Vladivostok, where his father was commanding general of the Czarist forces. He left Vladivostok around 1920, made his way to California via Harbin and Japan, and enrolled at Berkeley as an undergraduate. When he received the Ph.D. in Oriental languages in 1930, he was already a humanistic scholar of unusual promise, superbly equipped with a knowledge of the principal ancient and modern Indo-European, Semitic, Hamitic, Altaic, Sinitic, and Malayo-Polynesian languages, with a broad acquaintance of major world cultures, with a mind that was both strikingly original and rigorously disciplined, and with a poet’s sensitivity to the nuances of language, and for the philological studies that he thought of as “the ability to conduct significant conversations with the dead.” During his early years on the Berkeley faculty, which he joined in 1932, he attracted wide professional attention with a series of erudite articles reflecting the three major areas of interest that became his permanent concerns–Sino-Altaica, early Chinese cultural history, and the classical Chinese script. By 1940 he was chairman of the Oriental languages department, which, during the entire decade, he gradually elevated to national prominence, stamping it in the process with his own passionate concern for scholarly discipline and integrity.

In the classroom, Boodberg was stimulating and provocative. His Great Books course was known throughout the University, his courses on Chinese characters and the Asiatic languages stretched the horizons of generations of undergraduate majors, and his impact on graduate students was profound and lasting. His courses were not closely organized; rather, his effectiveness as a teacher sprang from the power of his intellect, the breadth of his learning, and his ability to kindle the imagination of students and inspire them with his own scholarly ideals.

Boodberg loved the give-and-take of intellectual debate. In the 1940s, he took the lead in organizing the Colloquium Orientologicum, a faculty group with interests spanning the Asiatic continent. In the 40s and 50s the Colloquium attracted a surprisingly wide range of participants, but its prime movers were always Boodberg and a few other eminent humanists, mostly of European origin, whose far-reaching interests and lively wit made it a forum that was perhaps unique in Berkeley’s history.

Boodberg was a delightful conversationalist. The swift play of his imagination invested the most casual encounter with an aura of unpredictability, and he could usually be counted on for an amusing anecdote (typically at his own expense), delivered with his characteristic accent and high-pitched laugh. The elegance of his diction reflected what one Russian-speaking friend called “the artistic strain in Pjotr Alekseevich.” In the spacious chambers of his mind, there was room not only for the concerns of the philologist, but also for music and poetry. He had a great admiration for Gerard Manley Hopkins, whose techniques he borrowed for his own brilliant interpretations of Tu Fu; and to those who recall the delicacy and grace of his memorial tribute to Shih-Hsiang Ch’en, it will come as no surprise to learn that he composed verses in English and Russian.

Boodberg was not what is called a productive scholar. He discussed the fruits of his research in frequent public lectures, such as his presidential addresses to the American Oriental Society and its western branch (which he helped to found), and he produced, for limited distribution, numerous short technical papers, notably his “Cedules from a Berkeley Workshop in Asiatic Philology,” which are now collectors’ items, but the ambitious scope of his research projects, coupled with a certain innate diffidence, prevented what he referred to as “premature publication.” One of his long-term interests was a bold attempt to establish a complex of Western graphic symbols to represent each of the 30,000 characters of the classical Chinese script; another was a monograph on the life of Confucius, whose disciple he sometimes jokingly proclaimed himself. Despite the warmth of his personality, he had, indeed, a Confucian dignity and sense of decorum. Few people called him Peter. He was like Confucius, also, in his conviction that the proper concern of the scholar is “the meditative treasuring up of knowledge, the unwearying pursuit of wisdom, and the timeless instruction of others,” and in the affectionate respect he inspired in students and colleagues. We who knew him will not forget him or learn to bear his loss with indifference.

He leaves his sister Valentina, his wife Elena, and his daughter Xenia, a concert pianist of whom he was touchingly proud.

Yuen Ren Chao
Yakov Malkiel
Helen McCullough

Beijing Olympics slogan

Professor Victor H. Mair of the University of Pennsylvania has just released an interesting piece analyzing the somewhat odd choice of wording for the slogan for the 2008 Olympics in China:
Remarks on the slogan for the Beijing Olympics.

Mair is also editor of Sino-Platonic Papers.

German paper on Chinese language reform

Another paper I’ve come across in my Web surfing: Ideen zur Sprachreform in China ab den ersten phonetischen Transkribtionssystemen unter besonderer Berücksichtigung des Schriftstellers Lu Xun (“Thoughts on language reform in China, starting from the first phonetic transcription and with special consideration of the writer Lu Xun”).

I don’t read German, so I can’t vouch for the correctness of the paper. Actually, from what little I can read, it seems that the author has a few all-too-common notions about “dialects”, etc. But the paper might be worthwhile anyway.