scripts related to Chinese characters — an article

sample of some of the scripts discussed in the paper; click to view the articleThe most recent rerelease from Sino-Platonic Papers is The Family of Chinese Character-Type Scripts, by Zhou Youguang, one of the main people behind the creation of Hanyu Pinyin. So it’s no surprise that his name has come up before in Pinyin News.

This article, from September 1991, categorizes and briefly discusses more than a dozen scripts derived from Chinese characters, most of which were used inside China by non-Han people.

The link above is to an HTML version. The original format of the article is preserved in the PDF file (650 KB).

the party line: some education levels in China for 2010

Here are some recent pronouncements from the PRC’s Ministry of Education.

I don’t advise taking any of this at face value. I’ve put it on my site for reference purposes only.

The national level of education will be enhanced in the next three years. High school graduates will be a resource for new employees, according to the Ministry of Education. In 2010, employees with education beyond the junior college level will account for 10 percent of new employees. This is the aim of the framework for the Program for Chinese Education in the 11th “Five-Year Plan”

The framework outlines the complete plan to universalize nine-year compulsory education in China by 2010. The attendance rate at primary school will remain above 99 percent; the enrollment rate for junior middle school will reach over 98 percent. The illiteracy rate of the youth and middle-aged will drop to 2 percent. The enrollment rate for senior middle school will be around 80 percent. Secondary vocational education will be on the same scale with the general senior secondary schools. Students receiving a higher education will reach 30 million, with an enrollment rate of 25 percent. Adult education and continuing education will become more developed. Over 100 million urban and rural working people, annually, will be trained.

It is reported that China’s level of education is still relatively low. The average length of employee education is over three years below the average length of developed countries.

source: The national level of education to increase in next three years, People’s Daily, June 13, 2007

Critique of ordering of dictionaries for Mandarin Chinese

Sino-Platonic Papers has rereleased for free its very first issue, from February 1986: The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects (1.5 MB PDF), by Professor Victor H. Mair of the University of Pennsylvania’s Department of East Asian Languages and Civilizations.

This is an important essay that helped lead to the production of the ABC Chinese-English Comprehensive Dictionary, which is my favorite Mandarin-English dictionary.

Here is how it begins:

As a working Sinologist, each time I look up a word in my Webster’s or Kenkyusha‘s I experience a sharp pang of deprivation Having slaved over Chinese dictionaries arranged in every imaginable order (by K’ang-hsi radical, left-top radical, bottom-right radical, left-right split, total stroke count, shape of successive strokes, four-corner, three-corner, two-corner, kuei-hsieh, ts’ang-chieh, telegraphic code, rhyme tables, “phonetic” keys, and so on ad nauseam), I have become deeply envious of specialists in those languages, such as Japanese, Indonesian, Hindi, Persian, Russian, Turkish, Korean, Vietnamese, and so forth, which possess alphabetically arranged dictionaries. Even Zulu, Swahili, Akkadian (Assyrian), and now Sumerian have alphabetically ordered dictionaries for the convenience of scholars in these areas of research.

It is a source of continual regret and embarrassment that, in general, my colleagues in Chinese studies consult their dictionaries far less frequently than do those in other fields of area studies. But this is really not due to any glaring fault of their own and, in fact, they deserve more sympathy than censure. The difficulties are so enormous that very few students of Chinese are willing to undertake integral translations of texts, preferring instead to summarize, paraphrase, excerpt and render into their own language those passages which are relatively transparent Only individuals with exceptional determination, fortitude, and stamina are capable of returning again and again to the search for highly elusive characters in a welter of unfriendly lexicons. This may be one reason why Western Sinology lags so far behind Indology (where is our Böthlingk and Roth or Monier-Williams?), Greek studies (where is our Liddell and Scott?), Latin studies (Oxford Latin Dictionary), Arabic studies (Lane’s, disappointing in its arrangement by “roots” and its incompleteness but grand in its conception and scope), and other classical disciplines. Incredibly, many Chinese scholars with advanced degrees do not even know how to locate items in supposedly standard reference works or do so only with the greatest reluctance and deliberation. For those who do make the effort, the number of hours wasted in looking up words in Chinese dictionaries and other reference tools is absolutely staggering. What is most depressing about this profligacy, however, is that it is completely unnecessary. I propose, in this article, to show why.

First, a few definitions are required, What do I mean by an “alphabetically arranged dictionary”? I refer to a dictionary in which all words (tz’u) are interfiled strictly according to pronunciation. This may be referred to as a “single sort/tier/layer alphabetical” order or series. I most emphatically do not mean a dictionary arranged according to the sounds of initial single graphs (tzu), i.e. only the beginning syllables of whole words. With the latter type of arrangement, more than one sort is required to locate a given term. The head character must first be found and then a separate sort is required for the next character, and so on. Modern Chinese languages and dialects are as polysyllabic as the vast majority of other languages spoken in the world today (De Francis, 1984). In my estimation, there is no reason to go on treating them as variants of classical Chinese, which is an entirely different type of language. Having dabbled in all of them, I believe that the difference between classical Chinese and modern Chinese languages is at least as great as that between Latin and Italian, between classical Greek and modern Greek or between Sanskrit and Hindi. Yet no one confuses Italian with Latin, modern Greek with classical Greek, or Sanskrit with Hindi. As a matter of fact there are even several varieties of pre-modern Chinese just as with Greek (Homeric, Horatian, Demotic, Koine), Sanskrit (Vedic, Prakritic, Buddhist Hybrid), and Latin (Ciceronian, Low, Ecclesiastical, Medieval, New, etc.). If we can agree that there are fundamental structural differences between modern Chinese languages and classical Chinese, perhaps we can see the need for devising appropriately dissimilar dictionaries for their study.

One of the most salient distinctions between classical Chinese and Mandarin is the high degree of polysyllabicity of the latter vis-a-vis the former. There was indeed a certain percentage of truly polysyllabic words in classical Chinese, but these were largely loan- words from foreign languages, onomatopoeic borrowings from the spoken language, and dialectical expressions of restricted currency. Conversely, if one were to compile a list of the 60,000 most commonly used words and expressions in Mandarin, one would discover that more than 92% of these are polysyllabic. Given this configuration, it seems odd, if not perverse, that Chinese lexicographers should continue to insist on ordering their general purpose dictionaries according to the sounds or shapes of the first syllables of words alone.

Even in classical Chinese, the vast majority of lexical items that need to be looked up consist of more than one character. The number of entries in multiple character phrase books (e.g., P’ien-tzu lei-pien [approximately 110,000 entries in 240 chüan], P’ei-wen yün-fu [roughly 560,000 items in 212 chüan]) far exceeds those in the largest single character dictionaries (e.g., Chung-hua ta tzu-tien [48,000 graphs in four volumes], K’ang-hsi tzu-tien [49,030 graphs]). While syntactically and grammatically many of these multisyllabic entries may not be considered as discrete (i.e. bound) units, they still readily lend themselves to the principle of single-sort alphabetical searches. Furthermore, a large proportion of graphs in the exhaustive single character dictionaries were only used once in history or are variants and miswritten forms. Many of them are unpronounceable and the meanings of others are impossible to determine. In short, most of the graphs in such dictionaries are obscure and arcane. Well over two-thirds of the graphs in these comprehensive single character dictionaries would never be encountered in the entire lifetime of even the most assiduous Sinologist (unless, of course, he himself were a lexicographer). This is not to say that large single character dictionaries are unnecessary as a matter of record. It is, rather, only to point out that what bulk they do have is tremendously deceptive in terms of frequency of usage.

Strongly recommended.

Chinese characters for Taiwanese–a new list from Taiwan’s MOE

Taiwan’s Ministry of Education has released a list of Chinese characters that can be used for writing common words in Taiwanese. (Note: PDF file.) I’ve provided a few examples at the end of this post.

The minister of education stated last week that students will not be tested on Chinese characters for Taiwanese, so I doubt there will be a widespread effort to learn these. Moreover, some of these characters are not presently in Unicode, making their use in practical applications at best difficult. (And even if they were in Unicode, that doesn’t mean fonts would include them or that a significant number of people would have such fonts.)

More characters and readings are to be released later. But since this list of just three hundred entries took the ministry four years to compile (not counting the many years various scholars worked on this before then), I don’t think anyone should be expecting much more to be released soon.

Here is the ministry’s press release on this.

關於臺灣閩南語用字整理工作,本會自民國84年至92年已委託多位學者進行「閩南語本字研究計畫」,計得成果《閩南語字彙》8冊。又自民國90年至93年組織編輯委員會,編輯《臺灣閩南語常用詞辭典》。民國92年本會鄭前主任委員良偉並主持「臺灣閩南語常用300詞用字計畫小組」(95年奉部長指示更名為「整理臺灣閩南語基本字詞工作計畫小組」),聘請專家學者研議用字問題。本表所定用字,係綜合上述成果,並由「整理臺灣閩南語基本字詞工作計畫小組」召開多次會議訂定。
本表針對臺灣閩南語用字紛歧之語詞,秉持易教易學精神,尊重傳統習用漢字並兼顧音字系統性,推薦適用漢字。其原則分述如下:

  1. 傳統習用原則:本表所選用之漢字多為民間傳統習用之通俗用字,不論其為本字、訓用字、借音字或臺閩地區創用之漢字均屬之。如:
    1. 本字:
      臺灣傳統閩南語文所用漢字多為傳統用字,如:「山」(suann)註、「水」(tsuí)、「天」(thinn)等。部分詞語雖然在現代中文語義或用法已不盡相同,如:「箸」(tī,筷子)、「沃」(ak,澆)、「行」(kiânn,走)、「走」(tsáu,跑)、「倩」(tshiànn,僱用)、「晏」(uànn,晚)、「青盲」(tshenn-mê,失明)、「才調」(tsâi-tiāu,本事)等古漢語詞,保存在臺灣閩南語中,其漢字亦習用已久,本表基於尊重傳統,亦加以採用。
      另外,臺閩地區為因應閩南語文書寫之需,亦常使用臺閩特殊漢字,本表將此種「臺閩字」視同「本字」。其中部分用字如:「囝」(kiánn,孩子)、「粿」(kué)等早已收入漢字典中,自然方便使用,但部分用字如:「亻因」[webmaster’s note: written together as one character] (in,他們)、「**」[webmaster’s note: see PDF for these characters] (tshit-thô,遊玩)等因尚未收入漢語字典中,Unicode亦尚未設定字碼,或尚無字型支援,可暫時使用本表推薦之「異用字」,或以臺灣閩南語羅馬字拼音方案(臺羅)書寫。
    2. 訓用字:
      借用中文漢字之意義,而讀為閩南語音者,如:「穿衫」(tshīng sann,穿衣服)的「穿」、「仔」(á)、「無」(bô)、 「瘦」(sán)、「戇」(gōng)、「挖」(óo/ué)、「會」(ē)等均非本字,是為「訓用字」,亦列為推薦用字。
    3. 借音字:
      借用漢字之音或接近之音,而賦與閩南語意義者,如:「嘛」(mā,也)、「佳哉」(ka-tsài,幸虧)、「膨」(phòng,鼓起)、「磅空」(pōng-khang,山洞)的「磅」等均非本字,是為「借音字」,亦列為推薦用字。
  2. 音字系統性原則:如無傳統習用漢字或一字多音、一音多字情形,容易產生混淆,造成閱讀障礙或學習困難時,本表採用兩個解決辦法,分述如下:
    1. 若傳統通俗用字容易產生混淆,則改用華文習見之訓讀字。如所有格ê及單位詞ê,傳統用字均寫成「个」,造成「一个」可以讀為tsi̍t-ê,也可以讀為it–ê。故本表已將「个」字定為單位詞,如:tsi̍t-ê寫成「一个」,而所有格則訓用華文之「的」,如:it–ê則寫成「一的」、guá-ê寫成「我的」。
    2. 如以上通俗用字仍可能發生混淆時,則建議採用古漢字。如:「毋」(m̄,不)、「佇」(tī,在)、「媠」(suí,美)、「囥」(khǹg,放)、「跤」(kha,腳)、「蠓」(báng,蚊子)、「濟」(tsē,多)以及「吼」(háu,哭)、「誠」(tsiânn,很)、「冗」(līng,鬆)等。

Here are nine entries from the list of three hundred.

建議用字 音讀 又音 對應華語 用例 異用字
recommended character pronunciation alternate reading corresponding Mandarin example different wording
ba̍k   目鏡、目眉  
bang   蚊子 蠓仔、蠓罩
蠻皮 bân-phuê bân-phê, bân-phêr 頑強不化 你真蠻皮 慢皮
bat pat 認識、曾經 捌字、捌去  
beh bueh, berh 要、如果、快要 欲食飯、欲知、強欲 要、卜
  微、細小、輕微 風微微仔吹、微微仔笑  
bīn   臉、面 面色、面熟  
明仔載 bîn-á-tsài miâ-á-tsài, bîn-nà-tsài 明天、明日 明仔載會好天 明仔再、明旦載
  無、沒有 無錢、無閒  

sources:

Tonally Orthographic Pinyin

Tonally Orthographic Pinyin (TOP) is a modification of Hanyu Pinyin that uses capitalization practices to distinguish between the various tones of Mandarin.

This can mess with the capitalization found at the beginnings of sentences and proper nouns, so I have mixed feelings about it. But many find TOP useful as a learning tool and in writing text messages.

Here’s how TOP’s creator, Terry Thatcher Waltz, describes the system:

FIRST TONES ARE WRITTEN IN ALL CAPS. YOUR VOICE IS HIGH.

seconD toneS arE writteN witH thE lasT letteR capitalizeD. that’S becausE youR voicE haS tO risE.

third tones are written all lower case. that’s because the voice is low. (let’s keep discussions on the true nature of third and half-third tones somewhere else — this system is just to help us poor foreigners internalize tones!)

Fourth Tone Has The First Letter Of Each Word Capitalized, Because Your Voice Starts High And Then Falls Downward.

Thus, the phrase “wǒ měitiān liànxí Hànyǔ” would be written “wo meiTIAN LianxI Hanyu” in TOP.

See the first link below for details.

further reading:

illiteracy among China’s disabled: official PRC survey

A total of 43.29 percent of China’s disabled people aged 15 or above are illiterate, according to the results of the second China National Sample Survey on Disability.*

This represents a considerable drop from the 59 percent illiteracy rate in recorded in 1987, according to the survey.

The survey was conducted throughout China by government organizations, with 2,526,145 people in 771,797 households being inverviewed between April 1 and May 31, 2006.

People classified as disabled comprise 6.34 percent of the PRC’s population, or about 83 million people. More than 75 percent of these live in rural areas.

* Regular readers of this site know that I have little faith in the accuracy of China’s official statistics, especially concerning the topic of literacy. But I do like to keep track of what the PRC is saying about this.

source: Most of China’s disabled not financialy [sic] independent: survey, Xinhua via People’s Daily, May 29, 2007

Google’s new ‘cross-language information retrieval’

Google has just launched a “cross-language information retrieval” (CLIR) function to Google Translate.

Here is how Google describes it:

Now, you can search for something in your own language (for example, English) and search the web in another language (for example, French). If you’re looking for wine tasting events in Bordeaux while on vacation in France, just type “wine tasting events in Bordeaux” into the search box on the “Search results” tab on Google Translate. You’ll then get French search results and a (machine) translation of these search results into English. Similarly, an Arabic speaker could look for restaurants in New York, by searching for “???? ???????”; or a Chinese speaker could look for documents on machine learning on the English web by looking for “????”.

These are the languages available, though for now these are not available in all combinations but mainly to or from English. (German and French are the only languages listed that can work with each other rather than English.)

  • Arabic
  • English
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Mandarin (in traditional characters)
  • Mandarin (in simplified characters)
  • Portuguese
  • Russian
  • Spanish

sources:

Japanese literacy–an SPP reissue

Here’s another re-release from the archives of Sino-Platonic Papers: Computers and Japanese Literacy: Nihonzin no Yomikaki Nôryoku to Konpyûta, by J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures. The link above is to the PDF version (1.2 MB), which reproduces the original exactly.

This is a parallel text in Japanese (in romanization) and English, so if any of you want to practice reading romaji, here’s your chance.

The English text alone is available in HTML: Computers and Japanese Literacy.

The essay touches on many of themes Unger explores in depth in his books, all of which have excerpts available here on Pinyin Info: The Fifth Generation Fallacy, Literacy and Script Reform in Occupation Japan, and Ideogram: Chinese Characters and the Myth of Disembodied Meaning.

Here is the opening, in both English and Japanese (in romanization).

Watakusi wa saikin, gendai no konpyûta siyô to Nihongo ni tuite kenkyu site orimasu. Gengogakusya mo konpyûta no nôryoku ya mondaiten ni tuite iken o happyo suru sekinin ga aru to omou kara desu. I am currently engaged in research on contemporary computer usage and the Japanese language. Linguists too, I believe, have a responsibility to present their views on the potentials and problems of computers.
Sate, Amerika no zen- Kôsei Kyôiku tyôkan, John Gardner-si no kotoba de hazimetai to omoimasu. Sore wa “aizyô nasi no hihan to hihan nasi no aizyô (Eigo de iu to, “unloving criticism and uncritical love”) to iu kotoba desu. Gardner-si wa, Amerikazin no aikokusyugi ni tuite Amerika o sukosi de mo hihan site wa ikenai to syutyô suru hito wa kangaetigai da, aizyô nasi ni syakai ya bunka no ketten o hihan bakari suru koto wa motiron warui keredo, hihan sore zitai o kiratte kokusuisyugi o susumeru koto mo syôrai no tame ni yoku nai, to iimasita. Kono koto wa bokoku igai no syakai to bunka ni tai suru baai de mo onazi de wa nai desyô ka? Gengogakusya ya rekisigakusya mo “aizyô nasi no hihan to hihan nasi no aizyô” to iu ryôkyokutan o sakeru yô ni sita hô ga ii to omou no desu. Watakusi wa Nihon no gengo to bunka o senmon ni site, Nihon ni tai site aizyô o motte orimasu kara koso, Nihongo no hyôkihô ya Nihonzin no yomikaki nôryoku ni tuite no teisetu o mondai ni site iru wake desu. Iwayuru zyôhôka syakai no zidai ni hairi, ippan no hitobito ga pasokon ya wâpuro o kozin-yô ni tukau yô ni naru ni turete, nettowâku tûsin, kyôiku-yô sohutowea, sôzôteki na puroguramingu nado ga yôkyû sarete kite iru desyô. Mosi sono konpon ni aru yomikaki nôryoku no henka to genzyô o gokai sureba, gôriteki na konpyûta siyôhô o kaihatu dekinai darô to omou kara desu. Let me begin by quoting the former U.S. Secretary of Health, Education, and Welfare, John Gardner. I am thinking of his phrase “unloving criticism and uncritical love.” By this, he meant that it was wrong for proponents of American patriotism to oppose even the slightest criticism of the United States: although it is bad to dwell unsympathetically on finding fault with social and cultural shortcomings, it is equally bad for the future of society to advance nationalism and eschew all criticism. I think that this is also true when considering foreign societies and cultures. Linguists and historians would do well to avoid the twin extremes of “unloving criticism and uncritical love.” As someone professionally involved with the language and culture of Japan, I have an affection for the country, but for that very reason, I wish to call into question the accepted theory of Japanese script and literacy. As we enter the age of the so-called informational society, and as more and more ordinary people begin to use computers on an individual basis, demands on network communications, educational software, creative programming, and so on, will steadily increase. Unless we understand the present situation and history of literacy, which underlies all these applications, we cannot hope to develop a rational basis for computer usage.
Sate, hyôi mozi to iu kotoba wa Nihongo ni tuite no hon ni yoku dete imasu kara kokugogaku no yôgo da to itte mo ii hodo desu ga, hyôi mozi to iu mono wa zissai ni sonzai site iru desyô ka? Kyakkanteki ni kangaete miru to, dono gengo mo konponteki ni wa hanasu mono desu. Mozi wa syakaiteki, rekisiteki na men ga arimasu ga, mozi wa kotoba no imi no moto de wa arimasen. Tatoeba, itizi mo yomenai mômoku no hito de mo, hoka no syôgai ga nai kagiri, bokokugo ga kanzen ni hanaseru yô ni narimasu. Sitagatte, hanasi-kotoba to wa mattaku kankei ga nai mozi nado to iu mono wa muimi na gainen desu. Gengo no imi wa gengo no kôzô kara hassei si, mozi wa sono han’ei de sika nai wake desu. Kore wa toku ni kore kara no konpyûta o kangaeru toki ni wasurete wa ikemasen…. The term “ideographic characters” appears so often in books on the Japanese language that one might say it has become a stock phrase of Japanese linguistics. I wonder, however, whether such things as “ideographs” actually exist. When examined objectively, all languages are fundamentally speech. Characters are not the source of the meanings of words, although they do have their social and historical aspects. For example, blind people who cannot read a single character can nonetheless speak their native tongues perfectly, unless they suffer from some other handicap. The very idea of characters totally divorced from speech is therefore meaningless. For the meaning of language emerges from the structure of language, of which writing is merely a reflection. It is particularly important that we not forget this when we consider the computers of the future….

This was first published in January 1988 as issue no. 6 of Sino-Platonic Papers.