Google’s new ‘cross-language information retrieval’

Google has just launched a “cross-language information retrieval” (CLIR) function to Google Translate.

Here is how Google describes it:

Now, you can search for something in your own language (for example, English) and search the web in another language (for example, French). If you’re looking for wine tasting events in Bordeaux while on vacation in France, just type “wine tasting events in Bordeaux” into the search box on the “Search results” tab on Google Translate. You’ll then get French search results and a (machine) translation of these search results into English. Similarly, an Arabic speaker could look for restaurants in New York, by searching for “???? ???????”; or a Chinese speaker could look for documents on machine learning on the English web by looking for “????”.

These are the languages available, though for now these are not available in all combinations but mainly to or from English. (German and French are the only languages listed that can work with each other rather than English.)

  • Arabic
  • English
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Mandarin (in traditional characters)
  • Mandarin (in simplified characters)
  • Portuguese
  • Russian
  • Spanish

sources:

Japanese literacy–an SPP reissue

Here’s another re-release from the archives of Sino-Platonic Papers: Computers and Japanese Literacy: Nihonzin no Yomikaki Nôryoku to Konpyûta, by J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures. The link above is to the PDF version (1.2 MB), which reproduces the original exactly.

This is a parallel text in Japanese (in romanization) and English, so if any of you want to practice reading romaji, here’s your chance.

The English text alone is available in HTML: Computers and Japanese Literacy.

The essay touches on many of themes Unger explores in depth in his books, all of which have excerpts available here on Pinyin Info: The Fifth Generation Fallacy, Literacy and Script Reform in Occupation Japan, and Ideogram: Chinese Characters and the Myth of Disembodied Meaning.

Here is the opening, in both English and Japanese (in romanization).

Watakusi wa saikin, gendai no konpyûta siyô to Nihongo ni tuite kenkyu site orimasu. Gengogakusya mo konpyûta no nôryoku ya mondaiten ni tuite iken o happyo suru sekinin ga aru to omou kara desu. I am currently engaged in research on contemporary computer usage and the Japanese language. Linguists too, I believe, have a responsibility to present their views on the potentials and problems of computers.
Sate, Amerika no zen- Kôsei Kyôiku tyôkan, John Gardner-si no kotoba de hazimetai to omoimasu. Sore wa “aizyô nasi no hihan to hihan nasi no aizyô (Eigo de iu to, “unloving criticism and uncritical love”) to iu kotoba desu. Gardner-si wa, Amerikazin no aikokusyugi ni tuite Amerika o sukosi de mo hihan site wa ikenai to syutyô suru hito wa kangaetigai da, aizyô nasi ni syakai ya bunka no ketten o hihan bakari suru koto wa motiron warui keredo, hihan sore zitai o kiratte kokusuisyugi o susumeru koto mo syôrai no tame ni yoku nai, to iimasita. Kono koto wa bokoku igai no syakai to bunka ni tai suru baai de mo onazi de wa nai desyô ka? Gengogakusya ya rekisigakusya mo “aizyô nasi no hihan to hihan nasi no aizyô” to iu ryôkyokutan o sakeru yô ni sita hô ga ii to omou no desu. Watakusi wa Nihon no gengo to bunka o senmon ni site, Nihon ni tai site aizyô o motte orimasu kara koso, Nihongo no hyôkihô ya Nihonzin no yomikaki nôryoku ni tuite no teisetu o mondai ni site iru wake desu. Iwayuru zyôhôka syakai no zidai ni hairi, ippan no hitobito ga pasokon ya wâpuro o kozin-yô ni tukau yô ni naru ni turete, nettowâku tûsin, kyôiku-yô sohutowea, sôzôteki na puroguramingu nado ga yôkyû sarete kite iru desyô. Mosi sono konpon ni aru yomikaki nôryoku no henka to genzyô o gokai sureba, gôriteki na konpyûta siyôhô o kaihatu dekinai darô to omou kara desu. Let me begin by quoting the former U.S. Secretary of Health, Education, and Welfare, John Gardner. I am thinking of his phrase “unloving criticism and uncritical love.” By this, he meant that it was wrong for proponents of American patriotism to oppose even the slightest criticism of the United States: although it is bad to dwell unsympathetically on finding fault with social and cultural shortcomings, it is equally bad for the future of society to advance nationalism and eschew all criticism. I think that this is also true when considering foreign societies and cultures. Linguists and historians would do well to avoid the twin extremes of “unloving criticism and uncritical love.” As someone professionally involved with the language and culture of Japan, I have an affection for the country, but for that very reason, I wish to call into question the accepted theory of Japanese script and literacy. As we enter the age of the so-called informational society, and as more and more ordinary people begin to use computers on an individual basis, demands on network communications, educational software, creative programming, and so on, will steadily increase. Unless we understand the present situation and history of literacy, which underlies all these applications, we cannot hope to develop a rational basis for computer usage.
Sate, hyôi mozi to iu kotoba wa Nihongo ni tuite no hon ni yoku dete imasu kara kokugogaku no yôgo da to itte mo ii hodo desu ga, hyôi mozi to iu mono wa zissai ni sonzai site iru desyô ka? Kyakkanteki ni kangaete miru to, dono gengo mo konponteki ni wa hanasu mono desu. Mozi wa syakaiteki, rekisiteki na men ga arimasu ga, mozi wa kotoba no imi no moto de wa arimasen. Tatoeba, itizi mo yomenai mômoku no hito de mo, hoka no syôgai ga nai kagiri, bokokugo ga kanzen ni hanaseru yô ni narimasu. Sitagatte, hanasi-kotoba to wa mattaku kankei ga nai mozi nado to iu mono wa muimi na gainen desu. Gengo no imi wa gengo no kôzô kara hassei si, mozi wa sono han’ei de sika nai wake desu. Kore wa toku ni kore kara no konpyûta o kangaeru toki ni wasurete wa ikemasen…. The term “ideographic characters” appears so often in books on the Japanese language that one might say it has become a stock phrase of Japanese linguistics. I wonder, however, whether such things as “ideographs” actually exist. When examined objectively, all languages are fundamentally speech. Characters are not the source of the meanings of words, although they do have their social and historical aspects. For example, blind people who cannot read a single character can nonetheless speak their native tongues perfectly, unless they suffer from some other handicap. The very idea of characters totally divorced from speech is therefore meaningless. For the meaning of language emerges from the structure of language, of which writing is merely a reflection. It is particularly important that we not forget this when we consider the computers of the future….

This was first published in January 1988 as issue no. 6 of Sino-Platonic Papers.

dictionary compilation’s four D’s and ‘spiritual ecstasy’

The Shanghai Daily has a profile of Lu Gusun (Lù Gǔsūn, 陆谷孙, 陸谷孫), editor-in-chief of the plainly titled English-Chinese Dictionary (Yīng-Hàn dà cídiǎn, 《英漢大詞典》). The second edition of this dictionary, which was released earlier this month, contains more than 20,000 new entries, an increase of some 10 percent.

Lu has spent most of his academic life at Fudan University, at which in 1965 he earned a master’s in foreign languages and literature. He stayed on as a teacher specializing in Shakespeare. But then came the Cultural Revolution. “Those were the days when the world could not tolerate a peaceful desk for study,” Lu said simply.

Criticized as bourgeois, he had to recite the poems of Alexander Pushkin after a day’s hard labor.

I wonder if the reporting here is accurate, as being forced to recite Pushkin would have been a very strange punishment from a number of standpoints — even in those very strange times.

Lu was forced out of teaching and assigned to compile dictionaries.

In 1970, Lu participated in compiling the New English-Chinese Dictionary, which is still available and has sold more than 10 million copies over the years.

One of the reasons for its vitality was the fact that Lu “smuggled” in many up-to-date words and expressions. Otherwise it would have been staid and quickly dated….

In 1975, Lu and a team of scholars started work on the English-Chinese Dictionary and he was appointed editor-in-chief in 1986.

It took them 16 years to finish the award-winning dictionary, and the team of compilers shrank to 17 people from 108 at the peak.

“To compile a dictionary, you have to bear the loneliness and resist various temptations,” says Lu. “Many partners gave it up for more lucrative posts, some went abroad, some started their own businesses and some died out of devotion to the creation of the dictionary.” A compiler in his 40s passed away just three months before the dictionary was published.

As for Lu, he used coffee, cigarettes, mustard and even alcohol to sustain his fighting spirit. He promised not to go abroad, publish books or take any part-time teaching jobs until the dictionary was complete….

“It is a solemn battle,” says Lu. “Only those who have experienced this can understand the solemnity…. The process of dictionary compilation is always plagued by the four Ds — namely, delays, deficits, delinquencies and deficiencies. But there is spiritual ecstasy that you can hardly experience elsewhere.”

Although Lu has formally retired from Fudan University, he continues to deliver popular lectures in English twice a week to freshmen, and he advises graduate students.

“I hope colleges can be a wonderland, not a wasteland for young people. They should have their minds sharpened and their lives enriched here,” he says.

“Some colleges now make training leaders their main target. But this goal can deprive students of many pure pleasures and undermine their enthusiasm for academic achievements,” the professor adds.

source: Prof inspires ‘spiritual ecstasy’, Shanghai Daily, May 15, 2007

S.P.: Surf’s uP

Here’s an image from the Taiwanese poster for the forthcoming movie Surf’s Up. Surfing penguins — I’m in no hurry to see this.

But I do very much like the lettering (in all senses), which incorporates two of the letters of the original title in the Chinese characters for the Mandarin name (Chōnglàng jìjié, 衝浪季節). (The title might end up translated differently in China.)

衝浪季節 -- with an S taking the place of the 'water' bushou in the 2nd character and a P taking the place of the '卩' in the final character

In the second character an S takes the place of the “water” bushou 氵 (”radical” is a misleading translation, so it’s best to avoid that word). And in the final character a P takes the place of the “卩”.

衝浪季節

See also JOHNNY DePP AND CHINeSe CHARACTeRS.

results of Hong Kong tests in Mandarin and English

The government of Hong Kong has released the results of February’s proficiency exams for prospective teachers of English and of Mandarin. A total of 1,836 candidates took the English exam, while 2,209 candidates were tested in Mandarin.

Here are the percentages of candidates attaining level 3, the basic proficiency requirement for language teachers, in 2007:

  • English
    • 78.8% in reading
    • 38.3% in writing
    • 80.4% in listening
    • 47.7% in speaking
    • 92.7% in classroom-language assessment
  • Mandarin
    • 39.6% in listening and recognition
    • 56.5% in Pinyin
    • 35.6% in speaking
    • 83.4% in classroom-language assessment

Percentages of candidates attaining level 3, the basic proficiency requirement for language teachers, in 2006:

  • English
    • 85.5% in reading
    • 45.9% in writing
    • 74.3% in listening
    • 37.0% in speaking
    • 92.7% in classroom-language assessment (exactly the same as in 2007 — strange)
  • Mandarin
    • 54% in listening and recognition
    • 50% in Pinyin
    • 38% in speaking
    • 85% in classroom-language assessment

sources and further reading:

reviews of books related to China and linguistics (2)

Sino-Platonic Papers has just released online its second compilation of book reviews. Here are the books discussed. (Note: The links below do not lead to the reviews but to other material. Use the link above.)

Invited Reviews

  • William A. Boltz, “The Typological Analysis of the Chinese Script.” A review article of John DeFrancis, Visible Speech, the Diverse Oneness of Writing Systems.
  • Paul Varley and Kumakura Isao, eds., Tea in Japan: Essays on the History of Chanoyu. Reviewed by William R. LaFleur .
  • Vladimir N. Basilov, ed., Nomads of Eurasia. Reviewed by David A. Utz.

Reviews by the Editor

  • “Philosophy and Language.” A review article of Françcois Jullien, Procès ou Création: Une introduction a la pensée des lettrés chinois.

Language and Linguistics

  • W. South Coblin, A Handbook of Eastern Han Sound Glosses.
  • Weldon South Coblin. A Sinologist’s Handlist of Sino-Tibetan Lexical Comparisons.
  • ZHOU Zhenhe and YOU Rujie. Fangyan yu Zhongguo Wenhua [Topolects and Chinese Culture].
  • CHOU Fa-kao. Papers in Chinese Linguistics and Epigraphy.
  • ZENG Zifan. Guangzhouhua Putonghua Duibi Qutan [Interesting Parallels between Cantonese and Mandarin].
  • Luciana Bressan. La Determinazione delle Norme Ortografiche del Pinyin.
  • JIANG Shaoyu and XU Changhua, tr. Zhongguoyu Lishi Wenfa [A Historical Grammar of Modern Chinese] by OTA Tatsuo.
  • McMahon, et al. Expository Writing in Chinese.
  • P. C. T’ung and D. E. Pollard. Colloquial Chinese.
  • Li Sijing, Hanyu “er” Yin Shih Yanjiu [Studies on the History of the “er” Sound in Sinitic].
  • Maurice Coyaud, Les langues dans le monde chinois.
  • Patricia Herbert and Anthony Milner, eds., South-East Asia: Languages and Literatures; A Select Guide.
  • Andrew Large, The Artificial Language Movement.
  • Wilhelm von Humboldt, On Language: The Diversity of Hunan Language-Structure and Its Influence on the Mental Development of Mankind.
  • Vitaly Shevoroshkin, ed., Reconstructing Languages and Cultures.
  • Jan Wind, et al., eds., Studies in Language Origins.

Short Notices

  • A. Kondratov, Sounds and Signs.
  • Jeremy Campbell, Grammatical Man: Information, Entropy, Language, and Life.
  • Pitfalls of the Tetragraphic Script.

Lexicography and Lexicology

  • MIN Jiaji, et al., comp., Hanyu Xinci Cidian [A Dictionary of New Sinitic Terms]
  • LYU Caizhen, et al., comp., Xiandai Hanyu Nanci Cidian [A Dictionary of Difficult Terms in Modern Sinitic].
  • Tom McArthur, Worlds of Reference: Lexicography, learning and language from the clay tablet to the computer.

A Bouquet of Pekingese Lexicons

  • JIN Shoushen, comp., Beijinghua Yuhui [Pekingese Vocabulary].
  • SONG Xiaocai and MA Xinhua, comp., Beijinghua Ciyu Lishi [Pekingese Expressions with Examples and Explanations] .
  • SONG Xiaocai and MA Xinhua, comp., Beijinghua Yuci Huishi [Pekingese Words and Phrases with Explanations] .
  • FU Min and GAO Aijun, comp., Beijinghua Ciyu (Dialectical Words and Phrases in Beijing).

A Bibliographical Trilogy

  • Paul Fu-mien Yang, comp., Chinese Linguistics: A Selected and Classified Bibliography.
  • Paul Fu-mien Yang, comp., Chinese Dialectology: A Selected and Classified Bibliography.
  • Paul Fu-mien Yang, comp., Chinese Lexicology and Lexicography: A Selected and Classified Bibliography.

Orality and Literacy

  • Jack Goody. The interface between the written and the oral.
  • Jack Goody. The logic of writing and the organization of society.
  • Deborah Tannen, ed., Spoken and Written Language: Exploring Orality and Literacy.

Society and Culture

  • Scott Simmie and Bob Nixon, Tiananmen Square.
  • Thomas H. C. Lee, Government Education and Examinations in Sung China.
  • ZHANG Zhishan, tr. and ed., Zhongguo zhi Xing [Record of a Journey to China].
  • LIN Wushu, Monijiao ji Qi Dongjian [Manichaeism and Its Eastward Expansion].
  • E. N. Anderson, The Food of China.
  • K. C. Chang, ed., Food in Chinese Culture: Anthropological and Historical Perspectives.
  • Jacques Gemet, China and the Christian Impact: A Conflict of Cultures.
  • D. E. Mungello, Curious Land: Jesuit Accommodation and the Origins of Sinology.

Short Notice

  • Roben Jastrow, The Enchanted Loom: Mind in the Universe.

In Memoriam
Chang-chen HSU
August 6, 1957 – June 27, 1989

  • Hsu Chang-chen, ed., and tr., Yin-tu hsien-tai hsiao-shuo hsüan [A Selection of Contemporary Indian Fiction].
  • Hsu Chang-chen, T’o-fu tzu-huiyen-chiu (Mastering TOEFL Vocabulary).
  • Hsu Chang-chen, Tsui-chung-yao-te i pai ke Ying-wen tzu-shou tzu-ken (100 English Prefixes and Word Roots).
  • Hsu Chang-chen, Fa-wen tzu-hui chieh-koufen-hsi — tzu-shou yü tzu-ken (Les préfixes et les racines de la langue française).
  • Hsu Chang-chen, comp. and tr., Hsi-yü yü Fo-chiao wen-shih lun-chi (Collection of Articles on Studies of Central Asia, India, and Buddhism).

This is SPP no. 14, from December 1989. The entire text is now online as a 7.3 MB PDF.

See my earlier post for the contents of the first SPP volume of reviews and a link to the full volume.

Banqiao’s orificial signage

David, who for just a little while longer lives in the same Banqiao neighborhood as I, sent me a photo of a street sign in our highly populated but little-discussed city.

'Guanciao W. Rd.': streetsign in Banqiao, Taiwan, labeled in misspelled Tongyong Pinyin and English

The sign tells us this is “Guanciao” West Road. In Hanyu Pinyin this would be “Guanqiao.” Guanqiao? The only word in my biggest Mandarin-English dictionary under that spelling is guānqiào (關竅/关窍), which is defined as “orifices on the human body.” Hmm. Taiwan might have the questionable taste of having many a road still named after a dead dictator, but orifices?

This oddity is explained by the fact that Banqiao is simply continuing its tradition of typos — even on relatively new signs. (The style of the sign and the choice of Tongyong Pinyin both indicate this went up within the past few years.)

Guanciao (Guanqiao) should be Guancian. (In Hanyu Pinyin, 館前西路 is written Guǎnqián Xīlù.) It’s worth noting this is not a tiny lane but a road in a well-traveled part of town.

As long as I’m putting up yet another post with photos and doing further damage to my reputation of having one of the Taiwan blogosphere’s fastest-loading, least Turtonesque sites*, I might as well go ahead and add one more so I can mention something else about this sign.

Let’s look at the relative size of the Chinese characters and the alphabetic text. The majority of the letters are but one quarter of the height of the Chinese characters.

sign showing the relative percentages of the height of the letters/Hanzi on the sign

Although in this particular case the lettering might not be too small, this style often leads to nearly illegible romanization, especially on signs posted high above streets.

* Just in terms of the average number photos per post, that is. (But that’s in part because I’m a lousy photographer.) Congratulations, Michael, on reaching two thousand posts!

Taipei’s new busstop signs

white on gray busstop sign reading 'Tianmu New Village', with Chinese charactersThe Taipei City Government has begun to replace busstop signs throughout the city.

The color scheme of the new signs, however, is a poor choice because white letters against a gray background offer little contrast, especially at night.

Here, for example, is are daytime and nighttime shots of the same stop, taken from different angles.

shot of the busstop signs during the day nighttime shot of the busstop sign, showing the low level of contrast at night

The stop is not lit well, so the nighttime photo had to be taken with a flash. So this photo, though the focus came out a little fuzzy, represents an improvement over what people would normally see at night.

Two weeks ago I wrote the Taipei City Government’s Department of Transportation for clarification about the policies associated with this signage but have not received an answer. Everyone I have spoken with in that office has been friendly; but the system is unfortunately still stuck in its ways.

At least InTerCaPiTaLiZaTion doesn’t seem to be in effect. And the new style for displaying bus routes does provide more information to those who cannot read Chinese characters.

List of busstops for the Taipei 220 bus, as given on the new style (spring 2007) of busstop signage. Click for larger image.
(click for larger image)