software to test Mandarin pronunciation

Chinese scientists have developed a computer program to test how well people speak Mandarin Chinese.

The technology will help improve oral testing of Chinese and promote Mandarin Chinese both at home and abroad, said Fu Yong, former deputy director of the State Language Work Committee.

The technology was jointly developed by the Acoustics Institute and the Software Institute under the Chinese Academy of Sciences and Hong Kong Polytechnic University.

Lab experiments show that more than 98 percent of the results given by the computer evaluation system were as same as the results given by linguists, said Ju Qi, deputy director of the Acoustics Institute.

The system will be introduced to Mandarin Chinese examinations in Hong Kong’s middle schools and universities.

source: China resorts to computer to test Mandarin Chinese, People’s Daily, via Xinhua, May 23, 2007

reviews of books related to China and linguistics (2)

Sino-Platonic Papers has just released online its second compilation of book reviews. Here are the books discussed. (Note: The links below do not lead to the reviews but to other material. Use the link above.)

Invited Reviews

  • William A. Boltz, “The Typological Analysis of the Chinese Script.” A review article of John DeFrancis, Visible Speech, the Diverse Oneness of Writing Systems.
  • Paul Varley and Kumakura Isao, eds., Tea in Japan: Essays on the History of Chanoyu. Reviewed by William R. LaFleur .
  • Vladimir N. Basilov, ed., Nomads of Eurasia. Reviewed by David A. Utz.

Reviews by the Editor

  • “Philosophy and Language.” A review article of Françcois Jullien, Procès ou Création: Une introduction a la pensée des lettrés chinois.

Language and Linguistics

  • W. South Coblin, A Handbook of Eastern Han Sound Glosses.
  • Weldon South Coblin. A Sinologist’s Handlist of Sino-Tibetan Lexical Comparisons.
  • ZHOU Zhenhe and YOU Rujie. Fangyan yu Zhongguo Wenhua [Topolects and Chinese Culture].
  • CHOU Fa-kao. Papers in Chinese Linguistics and Epigraphy.
  • ZENG Zifan. Guangzhouhua Putonghua Duibi Qutan [Interesting Parallels between Cantonese and Mandarin].
  • Luciana Bressan. La Determinazione delle Norme Ortografiche del Pinyin.
  • JIANG Shaoyu and XU Changhua, tr. Zhongguoyu Lishi Wenfa [A Historical Grammar of Modern Chinese] by OTA Tatsuo.
  • McMahon, et al. Expository Writing in Chinese.
  • P. C. T’ung and D. E. Pollard. Colloquial Chinese.
  • Li Sijing, Hanyu “er” Yin Shih Yanjiu [Studies on the History of the “er” Sound in Sinitic].
  • Maurice Coyaud, Les langues dans le monde chinois.
  • Patricia Herbert and Anthony Milner, eds., South-East Asia: Languages and Literatures; A Select Guide.
  • Andrew Large, The Artificial Language Movement.
  • Wilhelm von Humboldt, On Language: The Diversity of Hunan Language-Structure and Its Influence on the Mental Development of Mankind.
  • Vitaly Shevoroshkin, ed., Reconstructing Languages and Cultures.
  • Jan Wind, et al., eds., Studies in Language Origins.

Short Notices

  • A. Kondratov, Sounds and Signs.
  • Jeremy Campbell, Grammatical Man: Information, Entropy, Language, and Life.
  • Pitfalls of the Tetragraphic Script.

Lexicography and Lexicology

  • MIN Jiaji, et al., comp., Hanyu Xinci Cidian [A Dictionary of New Sinitic Terms]
  • LYU Caizhen, et al., comp., Xiandai Hanyu Nanci Cidian [A Dictionary of Difficult Terms in Modern Sinitic].
  • Tom McArthur, Worlds of Reference: Lexicography, learning and language from the clay tablet to the computer.

A Bouquet of Pekingese Lexicons

  • JIN Shoushen, comp., Beijinghua Yuhui [Pekingese Vocabulary].
  • SONG Xiaocai and MA Xinhua, comp., Beijinghua Ciyu Lishi [Pekingese Expressions with Examples and Explanations] .
  • SONG Xiaocai and MA Xinhua, comp., Beijinghua Yuci Huishi [Pekingese Words and Phrases with Explanations] .
  • FU Min and GAO Aijun, comp., Beijinghua Ciyu (Dialectical Words and Phrases in Beijing).

A Bibliographical Trilogy

  • Paul Fu-mien Yang, comp., Chinese Linguistics: A Selected and Classified Bibliography.
  • Paul Fu-mien Yang, comp., Chinese Dialectology: A Selected and Classified Bibliography.
  • Paul Fu-mien Yang, comp., Chinese Lexicology and Lexicography: A Selected and Classified Bibliography.

Orality and Literacy

  • Jack Goody. The interface between the written and the oral.
  • Jack Goody. The logic of writing and the organization of society.
  • Deborah Tannen, ed., Spoken and Written Language: Exploring Orality and Literacy.

Society and Culture

  • Scott Simmie and Bob Nixon, Tiananmen Square.
  • Thomas H. C. Lee, Government Education and Examinations in Sung China.
  • ZHANG Zhishan, tr. and ed., Zhongguo zhi Xing [Record of a Journey to China].
  • LIN Wushu, Monijiao ji Qi Dongjian [Manichaeism and Its Eastward Expansion].
  • E. N. Anderson, The Food of China.
  • K. C. Chang, ed., Food in Chinese Culture: Anthropological and Historical Perspectives.
  • Jacques Gemet, China and the Christian Impact: A Conflict of Cultures.
  • D. E. Mungello, Curious Land: Jesuit Accommodation and the Origins of Sinology.

Short Notice

  • Roben Jastrow, The Enchanted Loom: Mind in the Universe.

In Memoriam
Chang-chen HSU
August 6, 1957 – June 27, 1989

  • Hsu Chang-chen, ed., and tr., Yin-tu hsien-tai hsiao-shuo hsüan [A Selection of Contemporary Indian Fiction].
  • Hsu Chang-chen, T’o-fu tzu-huiyen-chiu (Mastering TOEFL Vocabulary).
  • Hsu Chang-chen, Tsui-chung-yao-te i pai ke Ying-wen tzu-shou tzu-ken (100 English Prefixes and Word Roots).
  • Hsu Chang-chen, Fa-wen tzu-hui chieh-koufen-hsi — tzu-shou yü tzu-ken (Les préfixes et les racines de la langue française).
  • Hsu Chang-chen, comp. and tr., Hsi-yü yü Fo-chiao wen-shih lun-chi (Collection of Articles on Studies of Central Asia, India, and Buddhism).

This is SPP no. 14, from December 1989. The entire text is now online as a 7.3 MB PDF.

See my earlier post for the contents of the first SPP volume of reviews and a link to the full volume.

Google releases Pinyin input method for Windows, IE

Google has released a Pinyin-based character-input method for Windows systems. It offers a number of special features … which I don’t have time to detail right now, sorry. Read about them here: Google Gǔgē pīnyīn shūrùfǎ gōngnéng jièshào. And download the program from this page.

some character-input methods ‘Westernizing’ Chinese culture and making it ‘degenerate’: PRC official

Many of the stories I come across in my searches for news about Pinyin are related to input methods for Chinese characters. But I seldom find anything of interest in these. They tend to follow the same template: someone is touting a great new character-input method that is just so much better than Pinyin and everything else. It’s going to save Chinese characters and thus Chinese civilization and all that is good in the universe, etc. Blah, blah, blah. I just get bored.

But I recently came across one widely reprinted article that’s a bit more interesting and amusing/alarming/absurd. It has the additional advantage of being about the claims of a member of the PRC’s Chinese People’s Political Consultative Conference. Here’s the key paragraph:

Chén Duó wěiyuán shuō: “Shǒujī Hànzì shūrù jìshù yīlài wàiguó gōngsī zhìshǎo zàochéng sān dà wèntí. Shǒuxiān, wàiguó gōngsī de Hànzì shūrùfǎ pòhuài le wǒmen shǐyòng Hànyǔ Hànzì de chuántǒng sīwéi xíguàn, dǎozhì Hànwén huà yánghuà, yìhuà, tuìhuà; qícì, wàiguó gōngsī bù zhíxíng wǒguó 27,484 gè zì de qiángzhìxìng biāozhǔn, biānmǎ zì liáng zhǐyǒu 6,763 gè zì, zàochéng Hànzì shǐyòng hùnluàn, Hànzì wénběn xìnxī shīzhēn, yǐngxiǎng guójiā xìnxī ānquán; hái yǒu, Zhōngguó měinián huā jǐ yì yuán gòumǎi wàiguó gōngsī de Hànzì shūrù ruǎnjiàn, yèjiè liǎnmiàn hézài? Hànzì wénhuà de zūnyán, quánwēi bèi zhìyú hédì?”

Committee member Chen Duo said: “The reliance of mobile phones on foreign corporations’ Chinese character input technology creates at least three major problems. First, foreign corporation’s Chinese character input methods are destroying the traditional patterns for thinking about using Chinese characters and are Westernizing Chinese culture, [causing it to be] alienated and degenerate. Next, foreign corporations are not complying with our country’s compulsory standard of 27,484 characters, using instead only 6,763 characters, which wreaks chaos in the use of Chinese characters, distorts Chinese character text messages, and affects national information security. Also, China spends hundreds of millions of yuan every year on Chinese character input software. Where is the self respect of the [domestic] industry? The dignity and prestige of the culture of Chinese characters — where have they been put?

About a week later Liu Naiqiang (刘廼强), another member of the Chinese People’s Political Consultative Conference, was touting the “fool” (shǎguā) character-input method, whatever that is, and warning against Pinyin.

Here is the whole article about Chen Duo:

“Wǒguó yǒu chāoguò 4.6 yì shǒujī yònghù, jū quánqiú dìyī, dàn yǒu jiǔchéng yònghù shūrù Hànzì shí, shǐyòng de shì wàiguó jìshù!” láizì xīnwén chūbǎnjiè de quánguó Zhèngxié wěiyuán Chén Duó zài quánguó Zhèngxié shí jiè wǔ cì huìyì gānggang kāishǐ shí, biàn tíjiāo le yī fèn zhǔnbèi hěn jiǔ de tí’àn, jiànyì jǐnkuài shíshī shùzì jiànpán Hànzì shūrù guójiā biāozhǔn, niǔzhuǎn wǒguó shǒujī Hànzì shūrù jìshù shòukòng yú wàiguó gōngsī de júmiàn.

Chén Duó wěiyuán shuō: “shǒujī Hànzì shūrù jìshù yīlài wàiguó gōngsī zhìshǎo zàochéng sān dà wèntí. Shǒuxiān, wàiguó gōngsī de Hànzì shūrùfǎ pòhuài le wǒmen shǐyòng Hànyǔ Hànzì de chuántǒng sīwéi xíguàn, dǎozhì Hànwén huà yánghuà, yìhuà, tuìhuà; qícì, wàiguó gōngsī bù zhíxíng wǒguó 27,484 gè zì de qiángzhìxìng biāozhǔn, biānmǎ zì liáng zhǐyǒu 6,763 gè zì, zàochéng Hànzì shǐyòng hùnluàn, Hànzì wénběn xìnxī shīzhēn, yǐngxiǎng guójiā xìnxī ‘ānquán; hái yǒu, Zhōngguó měinián huā jǐ yì yuán gòumǎi wàiguó gōngsī de Hànzì shūrù ruǎnjiàn, yèjiè liǎnmiàn hézài? Hànzì wénhuà de zūnyán, quánwēi bèi zhìyú hédì?”

Chén Duó jīngguò diàoyán huòxī, yóu Zhōngguórén zìzhǔ kāifā de guó bǐ shūrù jìshù zì liáng 27,484 gè, pīnyīn shūrù sùdù Bǐguó wài shūrùfǎ kuàijiāng jìn sì chéng, bǐhuà shūrù Bǐguó wài shūrùfǎ kuài yībàn, yīn xíng zǔhé shūrù Bǐguó wài pīnyīn shūrùfǎ kuài jìn qīchéng. Tā rènwéi, “guó bǐ cǎijí jìsuàn le shù bǎiyì zì de Zhōngguó bǎixìng xíguàn yòngyǔ yòng cí, yōngyǒu gèxìng huà de zhìnéng tiáopín wénzì shūrù fāng’àn, yínghé le Zhōngguó bǎixìng shǐyòng Hànyǔ Hànzì de chuántǒng sīwéi guànxìng, shǐ wénzì shūrù gèng liúchàng, fāngbiàn, shíyòng. “2006 nián 10 yuè, xìnxī chǎnyè bù zhàokāi le yǐ guó bǐ shūrùfǎ wéizhǔ dǎo de guójiā biāozhǔn 《xìnxī jìshù shùzì jiànpán Hànzì shūrù tōngyòng yāoqiú》 zhēngqiú yìjiàn huì, chàngyì quánguó gè dàshǒu jī shèjì shāng, zhìzàoshāng děng cǎiyòng wǒguó zìzhǔ chuàngxīn de Hànzì shūrùfǎ.

Chén Duó wěiyuán shuō, jǐnguǎn guó bǐ shūrù jìshù yǐ qiànrù le kāng jiā, jīn lì, yǔ lóng, TCL děng zhōngduān chǎnpǐn, dǎkāi le shìchǎng de quēkǒu, dàn yóuyú shūrù jìshù shìyǐ qiànrù jìshù de fāngshì jìnrù shìchǎng, zhǔnrù ménkǎn gāo, zhōuqī cháng; zhàn wǒguó 60% yǐshàng shǒujī shìchǎng de jǐ dàguó wài pǐnpái shāng, cúnzài cǎigòu wàiguó gōngsī ruǎnjiàn de guànxìng, yǒude guónèi shǒujī chǎngshāng yě mángmù chóngbài guówài chǎnpǐn; jiāshàng shuǐhuò shǒujī jí shǎo fùfèi děng yuányīn, guónèi de Hànzì shūrù jìshù yào yǔguó wài yǐjing xíngchéng lǒngduàn de gōngsī jìngzhēng, nándù fēicháng dà; jiāzhī zhè xiàng jìshù de ménkǎn jiàogāo, jíshǐ qiāndìng le hézuò xiéyì, cóngxīn shǒujī yánfā dào chéngshú de chǎnpǐn chūchǎng zhìshǎo xūyào 9 ge yuè de shíjiān, zhège guòchéng rúguǒ méiyǒu hěn hǎode jìshù bǎozhàng hédà liáng zījīn zhīchí, hěn nán wéichí xiaqu.

Wèicǐ, Chén Duó jiànyì guójiā yǒuguān bùmén cǎiqǔ qièshí cuòshī tuīdòng shùzì jiànpán Hànzì shūrù guójiā biāozhǔn de shíshī, jiāndū hé yǐndǎo yǒuguān shēngchǎn shāng zhíxíng guójiā biāozhǔn, tuījìn guóchǎn shǒujī Hànzì shūrù jìshù chǎnyèhuà, bìng cóng fúzhí zìzhǔ chuàngxīn de jiǎodù chūfā, duì qí jǐyǔ zhèngcèxìng zhīchí.

sources:

“只有顺着中文书写逻辑,以字形和笔顺为基础,不用学、不用记,人人都很快上手的‘傻瓜输入法’才能成为全球通用的中文输入法。国家应尽快将‘傻瓜输入法’开发出来。”全国政协委员刘廼强说。

现在社会上的中文输入法很多,像目前最流行的繁体“仓颉”、“简易”,简体的“五笔”等,但刘廼强认为它们是为要求速度的专业人员设计的,不适合现在人人都要自己输入,速度不是最重要要求的现实状况。

至于“拼音”输入法,刘廼强则认为,虽然繁简皆宜,更无须特别学和记,只要统一拼音标准,按道理是不错的全球通用的输入方法。“问题是中文不是语音语言,老用拼音输入法,很容易就会执笔忘字。实践证明,彻底拼音化决不是中文发展的正确方向,因而也不是中文输入应发展的方向,因为这样下去,中文便会萎缩灭亡。”

indicator of character frequency: a suggestion for programmers

It occurred to me the other day that many people, especially language learners, might find it useful to have a tool that would take text written in Chinese characters and mark it up according to the frequency of use of the individual characters within.

Here’s a sentence from a recent CCP rant news item that can serve as an example:

非驴非马的“网语”不再满足于偏安网络一隅,正迅速向着其它媒体渗透,因而加剧了报纸电视等文字语言的混乱,玷污了汉语言文化的纯洁。
(Fēilǘfēimǎ de “wǎng yǔ” bùzài mǎnzú yú piān’ān wǎngluò yīyú, zhèng xùnsù xiàngzhe qítā méitǐ shèntòu, yīn’ér jiājù le bàozhǐ diànshì děng wénzì yǔyán de hùnluàn, diànwū le Hànyǔ yán wénhuà de chúnjié.)

Predictably, many of the characters here are extremely common. Others, however, would not even be covered under China’s definition of literacy. I’ve separated these characters into different classes, based on their frequencies of usage and applied different colors to each class:

  • character frequency: 1-100 (class i-c)
  • character frequency: 101-500 (class c-d)
  • character frequency: 501-1000 (class d-m)
  • character frequency: 1001-1500 (class m-md)
  • character frequency: 1501-2000 (class md-mm)
  • character frequency: beyond 2000 (class mmplus)

So the sample sentence would look like this:

的“语”电视

(Those of you reading this through RSS may need to visit the site to see what I’m talking about.)

The coding I used looks like this, though other approaches are possible:

<span class=”c-d” title=”101-500″>非</span><span class=”mmplus” title=”2001+”>驴</span>….

I added titles to make this more accessible.

Perhaps adding a summary would be useful:

1-100              24.6%
101-500           42.1%
501-1000          8.8%
1001-1500         14.0%
1501-2000          1.8%
2001+              8.8%

This approach could also be used for Japanese — for example, to highlight all kanji not included in the Jōyō kanji, or to highlight different sets of the Kyōiku kanji. For that matter, it could also be applied to written words in English or other languages that use alphabets, though conjugutions, plurals, and the like would complicate matters.

So, would anyone like to try coming up with one of these? Or has it been done already?

one possible resource:

Adso now available for download

David Lancashire’s wonderful Adso — which I tend to use primarily for conversions into Pinyin (under Style, select Pinyin) but which can handle much, much more — is now available for download as a Unix binary. A Windows version is expected soon.

This is fully-featured non-crippleware and should run on most modern linux distributions. To my knowledge, it is also the first reasonably-functional and freely-downloadable machine translation and NLP engine in the world.

If I were even half the programmer I ought to be, I’d snap this up in an instant.

Chinese Characters as a High-Maintenance Script and the Consequences Thereof

The following is a guest post by Prof. Victor H. Mair of the University of Pennsylvania.

——————

Anyone who has taken it upon him/herself to become literate in Chinese characters realizes what a tremendous commitment is required to master the thousands of different graphs that are necessary for reading and writing. Great as the initial expenditure of time and energy is, one must continue to practice reading and writing the characters on an almost daily basis if one is to maintain a workable degree of proficiency. Furthermore, since character production is a skill that requires a high level of neuro-muscular coordination, failure to practice them regularly inevitably results in a rapid deterioration of the ability to write with facility.

In the world of the 21st century, however, there are countless distractions that compete with the Chinese script for the attention of its users: TV, movies, computers, cell phones, video games, iPods, sports, music, dance, and so forth. Every minute or hour devoted to such devices and diversions means less time for practicing the demanding script. In addition, many of these competitors directly or indirectly displace or obviate the script itself. For example, the vast majority of Sinitic language inputting for computers is done via pinyin (Romanization), and the same is true for short text messaging on cell phones which is so ubiquitous in East Asia. Countless studies and endless testimonies from individual users have shown that reliance on computers and other electronic devices to produce written character texts dramatically reduces the ability of users of the Chinese script to form the characters accurately and, to a lesser extent, even diminishes a reader’s ability to distinguish characters.

Some of this was pointed out already in Jennifer 8. Lee’s lengthy and well-researched article entitled “Where the PC Is Mightier Than the Pen: In China, Computer Use Erodes Traditional Handwriting, Stirring a Cultural Debate,” which appeared in the Technology News section of the New York Times on February 1, 2001. Here’s an abstract of Ms. Lee’s article, which was illustrated with photographs:

Use of computers for word processing appears to be taking a toll on Chinese speakers’ ability to write characters by hand; many Chinese fear that computer could undermine written language, which has great cultural significance for Chinese people, but others say the point of language is communication and nothing more; erosion of traditional handwriting skills arises from forcing complexities of Chinese language to conform to standard Roman-alphabet keyboard.

William Hannas, an expert on East Asian writing systems, has perceptively and persuasively pointed out that character production and recognition are intimately linked:

Educators speak too facilely of the distinction between character “recognition skills” and the skills needed to produce them by hand, as if the two were completely independent. In fact, there is much experimental and anecdotal evidence to support a connection between the two types of skills. As one’s ability physically to write Chinese characters, stroke by stroke, improves, so it seems does one’s ability to recognize them and distinguish one from the other. Conversely, as writing skills deteriorate from lack of practice, so does recognition. Primitive motor skills seem to play a part in reinforcing memory here as in other areas. {Original note: Kaiho Hiroyuki summarizes the results of experiments that demonstrate that character recognition is affected by users’ ability to draw them and that users’ appraisal of a character’s complexity depends more on stroke count than on the number of lines actually present in the character. “Nihongo no hyôki kôdô no ninchi shinrigakuteki bunseki,” Nihongogaku, 6 (1987), 65-71.}

If this phenomenon were related to handwriting specifically, literacy would have been lost in the West entirely by now, for most Westerners do their “writing” today on keyboards. But the fact is, typing has reinforced Westerners’ “hands on” awareness of the language by virtue of the direct one-to-one correspondence between discrete hand motions and the letters that make up the words. Character coding schemes, as we have seen, have little or no direct physical connection with the structure of the character — certainly none that bears any relationship to the specific motor skills that are exercised in forming characters. Although it seems unlikely, for all of the reasons given above, that nonphonetic coding will emerge as the primary means of processing Chinese characters for a significant part of the character-literate East Asian population, if this were to happen, the technique could lead eventually to a deterioration of users’ ability to deal with the characters generally. In other words, the same machines that were supposed to give the characters a new lease on life may contain the seeds of the characters’ destruction. {Asia’s Orthographic Dilemma (Honolulu: University of Hawaii Press, 1997), pp. 271-271, 314. 322.}

This is all the more true of phonetic inputting schemes for characters, which — though extremely easy to learn and use — are completely divorced from the shapes of the characters.

The diminution of the ability to produce and recognize characters resulting from electronic interventions has already reached a significant stage. As the number of distractions and displacements increases, which is a virtual certainty considering the rapid pace of invention and the growing impact of such devices, the level of dysfunctionality in character production and recognition is bound to advance from significant to serious.

Such competitors (computers, BlackBerries, and so on) pose far less of a threat to alphabetic scripts than to the characters for the following reasons:

  1. Alphabetic scripts require a far smaller initial investment and a fraction of the effort for maintenance.
  2. Many of the electronic devices mentioned above actually reinforce or improve writing in alphabetical scripts (spell checkers, grammar checkers, and so on [e-mail style, of course, is another matter altogether] — there are no comparable tools for Chinese).
  3. When one forgets how to write a character, one is usually stymied for that particular morpheme, whereas misspelling a word generally presents no obstacle to expression or understanding.

The implications of electronic information processing devices for the Chinese script are only beginning to be felt. As they increase in scope and availability, the adverse effects for character production and recognition will grow exponentially till they reach a genuine crisis.

China’s script-reform officials remark on ‘Internet language’

A few weeks ago the Guangming Daily asked several authorities their ideas on “Internet language” (wǎngluò yǔyán / 网络语言), the mix of abbreviated English and Pinyin along with slang that characterizes much of what is written on Internet chat services and the like.

Since three of those interviewed — Su Peicheng (Sū Péichéng / 苏培成), president of the PRC-government-sponsored Society for the Modernization of the Chinese Language (Zhōngguó Yǔwén Xiàndàihuà Xuéhuì); Qian Yuzhi (Qián Yùzhǐ / 钱玉趾), a member of the same group as Su; and Feng Zhiwei (Féng Zhìwěi / 冯志伟), a research fellow with the PRC Ministry of Education’s Institute of Applied Linguistics’s computational linguistics department — are in important positions related to script reform in China, their thoughts are worth noting. Not surprisingly, they aren’t particularly supportive of it. Su particularly stresses the need to instruct young people in the “harm” of using Internet language.

The fourth member of the group is Wu Zhiwei (Wǔ Zhìwěi / 武志伟), who works at the CCTV website.

source: Rúhé kàndài “Wǎngluò yǔyán” (如何看待“网络语言”), Guangming Ribao, December 7, 2006