Google Translate and romaji revisited

OK, Google has improved its Pinyin converter some, though it still fails in important areas. So that’s the present situation for Google and Mandarin.

How about for Google and Japanese?

Professor J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures generously agreed to reexamine Google’s performance in conversions to rōmaji (Japanese written in romanization).

Below is his latest evaluation.

For his initial analysis (in December 2009), see Google Translate and rōmaji.

I ran the test passage through Google Translate again. There’s some improvement, but it’s still pretty mediocre.

Original Google Translate
6日午後4時35分ごろ、東京都千代田区皇居外苑の都道(内堀通り)の二重橋前交差点で、中国からの観光客の40代の男性が乗用車にはねられ、全身を強く打って間もなく死亡した。車は歩道に乗り上げて歩いていた男性(69)もはね、男性は頭を強く打って意識不明の重体。丸の内署は、運転していた東京都港区白金3丁目、会社役員高橋延拓容疑者(24)を自動車運転過失傷害の疑いで現行犯逮捕し、容疑を同致死に切り替えて調べている。 6-Nichi gogo 4-ji 35-fun-goro, Tōkyō-to Chiyoda-ku Kōkyogaien no todō (uchibori-dōri) no Nijūbashi zen kōsaten de, Chūgoku kara no kankō kyaku no 40-dai no dansei ga jōyōsha ni hane rare, zenshin o tsuyoku Utte mamonaku shibō shita. Kuruma wa hodō ni noriagete aruite ita dansei (69) mo hane, dansei wa atama o tsuyoku utte ishiki fumei no jūtai. Marunouchi-sho wa, unten shite ita Tōkyō-to Minato-ku hakkin 3-chōme, kaisha yakuin Takahashi nobe Tsubuse yōgi-sha (24) o jidōsha unten kashitsu shōgai no utagai de genkō-han taiho shi, yōgi o dō chishi ni kirikaete shirabete iru.
 同署によると、死亡した男性は横断歩道を歩いて渡っていたところを直進してきた車にはねられた。車は左に急ハンドルを切り、車道と歩道の境に置かれた仮設のさくをはね上げ、歩道に乗り上げたという。さくは歩道でランニングをしていた男性(34)に当たり、男性は両足に軽いけが。 Dōsho ni yoru to, shibō shita dansei wa ōdan hodō o aruite watatte ita tokoro o chokushin shite kita kuruma ni hane rareta. Kuruma wa hidari ni kyū handoru o kiri, shadō to hodō no sakai ni oka reta kasetsu no saku o haneage, hodō ni noriageta toyuu. Saku wa hodō de ran’ningu o shite ita dansei (34) niatari, dansei wa ryōashi ni karui kega.
 同署は、死亡した男性の身元確認を進めるとともに、当時の交差点の信号の状況を調べている。 Dōsho wa, shibō shita dansei no mimoto kakunin o susumeru totomoni, tōji no kōsaten no shingō no jōkyō o shirabete iru.
 現場周辺は東京観光のスポットの一つだが、最近はジョギングを楽しむ人も増えている。 Genba shūhen wa Tōkyō kankō no supotto no hitotsudaga, saikin wa jogingu o tanoshimu hito mo fuete iru.

Notes:

  • The use of numerals dodges a plethora of errors, but “6-Nichi” is still wrong for Muika.
  • Lots of correct capitalizations have been added, but “uchibori” was missed and “Utte” capitalized by mistake.
  • Some false spaces or lack of spaces persist: “hane rare”, “oka reta”; “hitotsudaga” and “niatari” were correctly hitotsu da ga and ni atari in the original test.
  • Names still get butchered (“hakkin” for Shirogane, “nobe Tsubuse” for Nobuhiro.
  • The needless apostrophe in “ran’ningu” is still there.
  • Interestingly, “toyuu” is a new error: it should be to iu.
  • There’s evidence of some attempt to use hyphens, but why not in “kankō kyaku” or “Nijūbashi zen”?

So, to update: Google gets kudos for conscientiousness, but I stick by my original comments.

For more by Prof. Unger, see Pinyin.info’s recommended readings, which includes selections from The Fifth Generation Fallacy: Why Japan Is Betting Its Future on Artificial Intelligence, Literacy and Script Reform in Occupation Japan: Reading Between the Lines, and Ideogram: Chinese Characters and the Myth of Disembodied Meaning.

Banqiao — the Xinbei ways

Xinbei, formerly known as Taipei County and now officially bearing the atrocious English name of “New Taipei City,” has made available an online map of its territory.

Interestingly, the map is available not just in Mandarin with traditional Chinese characters and English with Hanyu Pinyin (most of the time — but more on that soon) but also in Mandarin with simplified Chinese characters. A Japanese interface is also available.

The interface for all versions opens to a map centered on Xinbei City Hall. What struck me upon seeing this for the first time was that, in just one small section, Banqiao is spelled four different ways:

  • Banqiao (Hanyu Pinyin)
  • Panchiao (bastardized Wade-Giles)
  • Ban-Chiau (MPS2, with an added hyphen)
  • Banciao (Tongyong Pinyin)

Click the map to see an enlargement.
click for larger version

I want to stress that these are not typos. These are the result of an inattention to detail that is all too common here.

The spelling for the city, er, district is also wrong in the interface, with Tongyong used. Since Banqiao is the seat of the Xinbei City Government and has more than half a million inhabitants,*, it’s not exactly so obscure that spelling its name correctly should be much of a challenge. Tongyong and other systems also crop up in some other names outside the interface.

It should be admitted, however, that the Xinbei map’s romanization is still better overall than the error-filled mess issued by GooGle.

*: including me

Google Translate and rōmaji

The following is a guest post by Professor J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures.

The challenge

On 18 November 2009, Mark Swofford posted an item on his website pinyin.info criticizing the way Google Translate produces Hanyu Pinyin from standard Chinese text. He concluded by saying, “Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I’ll leave those to others to analyze.” So I decided to take up Swofford’s challenge as it pertains to Japanese. Using Google Translate, I romanized a news item from the Asahi of 6 December 2009:

Original Google Translate
6日午後4時35分ごろ、東京都千代田区皇居外苑の都道(内堀通り)の二重橋前交差点で、中国からの観光客の40代の男性が乗用車にはねられ、全身を強く打って間もなく死亡した。車は歩道に乗り上げて歩いていた男性(69)もはね、男性は頭を強く打って意識不明の重体。丸の内署は、運転していた東京都港区白金3丁目、会社役員高橋延拓容疑者(24)を自動車運転過失傷害の疑いで現行犯逮捕し、容疑を同致死に切り替えて調べている。 roku nichi gogo yon ji san go fun goro , tōkyō to chiyoda ku kōkyogaien no todō ( uchibori dōri ) no nijūbashi zen kōsaten de , chūgoku kara no kankō kyaku no yon zero dai no dansei ga jōyōsha ni hane rare , zenshin wo tsuyoku u~tsu te mamonaku shibō shi ta . kuruma wa hodō ni noriage te arui te i ta dansei ( roku kyū ) mo hane , dansei wa atama wo tsuyoku u~tsu te ishiki fumei no jūtai . marunouchi sho wa , unten shi te i ta tōkyō to minato ku hakkin san chōme , kaisha yakuin takahashi nobe tsubuse yōgi sha ( ni yon ) wo jidōsha unten kashitsu shōgai no utagai de genkō han taiho shi , yōgi wo dō chishi ni kirikae te shirabe te iru .
 同署によると、死亡した男性は横断歩道を歩いて渡っていたところを直進してきた車にはねられた。車は左に急ハンドルを切り、車道と歩道の境に置かれた仮設のさくをはね上げ、歩道に乗り上げたという。さくは歩道でランニングをしていた男性(34)に当たり、男性は両足に軽いけが。 dōsho ni yoru to , shibō shi ta dansei wa ōdan hodō wo arui te wata~tsu te i ta tokoro wo chokushin shi te ki ta kuruma ni hane rare ta . kuruma wa hidari ni kyū handoru wo kiri , shadō to hodō no sakai ni oka re ta kasetsu no saku wo haneage , hodō ni noriage ta to iu . saku wa hodō de ran’ningu wo shi te i ta dansei ( san yon ) ni atari , dansei wa ryōashi ni karui kega .
 同署は、死亡した男性の身元確認を進めるとともに、当時の交差点の信号の状況を調べている。 dōsho wa , shibō shi ta dansei no mimoto kakunin wo susumeru totomoni , tōji no kōsaten no shingō no jōkyō wo shirabe te iru .
 現場周辺は東京観光のスポットの一つだが、最近はジョギングを楽しむ人も増えている。 genba shūhen wa tōkyō kankō no supotto no hitotsu da ga , saikin wa jogingu wo tanoshimu hito mo fue te iru .

Google’s romanization algorithm does a thoroughly mediocre job compared with what a human transcriber would do. To see this, compare the following:

Google Translate human transcriber
roku nichi gogo yon ji san go fun goro , tōkyō to chiyoda ku kōkyogaien no todō ( uchibori dōri ) no nijūbashi zen kōsaten de , chūgoku kara no kankō kyaku no yon zero dai no dansei ga jōyōsha ni hane rare , zenshin wo tsuyoku u~tsu te mamonaku shibō shi ta . kuruma wa hodō ni noriage te arui te i ta dansei ( roku kyū ) mo hane , dansei wa atama wo tsuyoku u~tsu te ishiki fumei no jūtai . marunouchi sho wa , unten shi te i ta tōkyō to minato ku hakkin san chōme , kaisha yakuin takahashi nobe tsubuse yōgi sha ( ni yon ) wo jidōsha unten kashitsu shōgai no utagai de genkō han taiho shi , yōgi wo dō chishi ni kirikae te shirabe te iru . Muika gogo yo-ji sanjūgo-fun goro, Tōkyō-to Chiyoda-ku Kōkyo Gaien no todō (Uchibori dōri) no Nijūbashi-zen kōsaten de, Chūgoku kara no kankō-kyaku no yonjū-dai no dansei ga jōyōsha ni hanerare, zenshin o tsuyoku utte mamonaku shibō-shita. Kuruma wa hodō ni noriagete aruite ita dansei (rokujūkyū) mo hane, dansei wa atama o tsuyoku utte ishiki fumei no jūtai. Marunouchi-sho wa, unten-shite ita Tōkyō-to Minato-ku Shirogane san-chōme, kaisha yakuin Takahashi Nobuhiro yōgisha (nijūyon) o jidōsha unten kashitsu shōgai no utagai de genkōhan taiho-shi, yōgi o dō-chishi ni kirikaete shirabete iru.
dōsho ni yoru to , shibō shi ta dansei wa ōdan hodō wo arui te wata~tsu te i ta tokoro wo chokushin shi te ki ta kuruma ni hane rare ta . kuruma wa hidari ni kyū handoru wo kiri , shadō to hodō no sakai ni oka re ta kasetsu no saku wo haneage , hodō ni noriage ta to iu . saku wa hodō de ran’ningu wo shi te i ta dansei ( san yon ) ni atari , dansei wa ryōashi ni karui kega . Dō-sho ni yoru to, shibō-shita dansei wa ōdan hodō o aruite watatte ita tokoro o chokushin-shite kita kuruma ni hanerareta. Kuruma wa hidari ni kyū-handoru o kiri, shadō to hodō no sakai ni okareta kasetsu no saku o haneage, hodō ni noriageta to iu. Saku wa hodō de ranningu o shite ita dansei (sanjūyon) ni atari, dansei wa ryōashi ni karui kega.
dōsho wa , shibō shi ta dansei no mimoto kakunin wo susumeru totomoni , tōji no kōsaten no shingō no jōkyō wo shirabe te iru . Dō-sho wa, shibō-shita dansei no mimoto kakunin o susumeru to tomo ni, tōji no kōsaten no shingō no jōkyō o shirabete iru.
genba shūhen wa tōkyō kankō no supotto no hitotsu da ga , saikin wa jogingu wo tanoshimu hito mo fue te iru . Genba shūhen wa Tōkyō kankō no supotto no hitotsu da ga, saikin wa jogingu o tanoshimu hito mo fuete iru.

For the sake of comparison, I have retained Google’s Hepburn-style romanization. The following changes have been made in the text in the righthand column:

  1. Misread words have been rewritten. Many involve numerals; e.g. muika for “roku nichi”, yo-ji for “yon ji”, sanjūgo-fun for “san go fun”. The personal name Nobuhiro is an educated guess, but “Nobetsubuse” is certainly wrong. Shirogane for “hakkin” is a place-name (N.B. Google did not produce *hakukin, indicating that the algorithm does more than just character-by-character on-yomi).
  2. False spaces and consequent misreadings have been eliminated. E.g. hanerare for “hane rare”, wattate ita for “wata~tsu te i ta”.
  3. Run-together phrases have been parsed correctly. E.g. to tomo ni for “totomoni”.
  4. Capitalization of proper nouns and the first words in sentences has been introduced.
  5. Hyphens are used conservatively for prefixes and suffixes, and for compound verbs with suru.
  6. Obsolete “wo” for the particle o has been eliminated. (N.B. Google did not produce *ha for the particle wa, so “wo” for o is just the result of laziness.)
  7. Apostrophes after n to indicate mora nasals in positions where they are not needed have been eliminated.
  8. Punctuation has been normalized to match for romanized format and paragraph indentations have been restored.

One could make the romanized text more easily readable by restoring arabic numerals, italicizing gairaigo, and so on. Of course, if the reporter knew that his/her copy would be reported orally or in romanization, s/he might have chosen different wording to avoid homophonic ambiguities. E.g., Marunouchi-sho could be Marunouchi Keisatsu-sho, though perhaps in the context of a traffic accident story, it is obvious that the suffix sho denotes ‘police station’. Furthermore, in a digraphic Japan, homophones might not be such as great problem. If, for instance, readers were accumstomed to seeing dōsho for 同所 ‘same place’, then dō-sho would immediately signal that something different was meant, which, given context, might be entirely sufficient to eliminate misunderstanding.

But having said all that, my guess is that the romanization function of Google Translate was programmed with some care. Rather than criticize the quality Google’s algorithm, I suggest pursuing the logical consequences of assuming that it deserves about a B+ by current standards.

Analysis

Clearly, there is a vast amount of knowledge an editor needs if s/he wants to bring Google’s result up to an acceptable level of romanization for human consumption. That minimal level, in turn, is probably a far cry from what a committee of linguists might decide would be an ideal romanization for daily use in 21st-century Japan. It is quite obvious why Google’s algorithm blunders — the reasons were well understood and described long ago (e.g. in Unger 1987) — and though the algorithm can be improved, it can never produce perfect results. Computers cannot read minds, and mindreading is ultimately what it would take to produce a flawless romanization.1

Furthermore, imagine the representation of the words of the text that presumably takes shape in some form or other in the mind of the skilled reader of the original text. Given that Google’s programmers are doing their best to get their computers to identify words and their forms from Japanese textual data, it is clear that readers, who achieve excellent comprehension with little or no conscious effort, must be doing vastly more. The sequence of stages — from (1) the original text to (2) the Google transcription, (3) the better edited version, (4) some future “ideal” romanization scheme, and onward to (5) whatever the brain of the skilled reader ultimately distills and comprehends — concretely illustrates how, at each stage, different kinds of information — from the easily programmable to genuine expert knowledge — must be brought to bear on the raw data.

Of course, something similar can be said of English texts as well: like Chinese characters, orthographic words of English, even though written with letters of the roman alphabet, typically function both logographically and phonographically. The English reader has to do some work too. But how much? Think of the sequence of stages just described in reverse order. The step from the mind of an expert reader (5) to an ideal romanization (4) is short compared with the distance down to the crude level of romanization produced by Google Translate (2). Yet Google does quite a bit relative to the original text (1). It does not totally fail, but rather makes mistakes, which, as just demonstrated, a human editor can identify and correct. It manages to find many word boundaries and no doubt could do better if the company’s programmers consulted some linguists and exerted themselves more. The point is that Japanese readers must cover the whole distance from the text to genuine comprehension, a distance that must be much greater than that traversed by the practiced reader of English, for all its quaint anachronistic spellings. With a decent, standardized roman orthography, the Japanese reader would have a considerably shorter distance to negotiate.

Note

  1. Indeed, starting in the 1980s, Asahi pioneered in the use of an IBM-designed system called NELSON (New Editing and Layout System of Newspapers) that uses large-array keyboards (descriptive input) rather than the sort of kanji henkan methods (transcriptive input) common on personal computers and dedicated word-processing systems. Consequently, the expedient of storing the underlying roman or kana input stream alongside the selected characters is not available for Asahi stories. Of course, such information is routinely thrown away by many other input systems too.

Journal issue focuses on romanization

cover of this issue of the Journal of the Royal Asiatic SocietyThe most recent issue of the Journal of the Royal Asiatic Society of Great Britain and Ireland (third series, volume 20, part 1, January 2010) features the following articles on romanization movements and script reforms.

  • Editorial Introduction: Romanisation in Comparative Perspective, by İlker Aytürk
  • The Literati and the Letters: A Few Words on the Turkish Alphabet Reform, by Laurent Mignon
  • Alphabet Reform in the Six Independent ex-Soviet Muslim Republics, by Jacob M. Landau
  • Politics of Romanisation in Azerbaijan (1921–1992), by Ayça Ergun
  • Romanisation in Uzbekistan Past and Present, by Mehmet Uzman
  • Romanisation of Bengali and Other Indian Scripts, by Dennis Kurzon
  • The Rōmaji movement in Japan, by Nanette Gottlieb
  • Postscript from the JRAS Editor, Sarah Ansari

Unfortunately, none of these cover any Sinitic languages or the case of Vietnam. And Gottlieb’s take on rōmaji is certainly more conservative than Unger’s. But I expect this will all make for interesting reading.

I am able to view all of the articles on my system. But perhaps others will run up against a subscription wall.

I thank Victor H. Mair for drawing this publication to my attention.

Google Translate’s new Pinyin function sucks

Google Translate has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it’s terrible, all things considered.

What Google has created is about at the same level as scripts hobbyists cobbled together the hard way about a decade ago from early versions of CE-DICT. Don’t get me wrong: I greatly admire what sites such as Ocrat achieved way back when. But for Google — with all of its data, talent, and money — to do essentially no better so many years later is nothing short of a disgrace.

To see Google Translate’s Pinyin function in action you must select “Chinese (Simplified)” or “Chinese (Traditional)” — not English — for the “Translate into” option. And then click on “Show romanization”.

For example, here’s what happens with the following text from an essay on simplified and traditional Chinese characters by Zhang Liqing:

談中國的“語”和“文”的問題,我覺得最好能先了解一下在中國通用的語言。中國的主要語言有哪些?為甚麼我說這個,而不說那個?因為環境?因為被強迫?因為我愛這個語言?因為有必要?因為這個語言很重要?也想想什麼是中國人的共同語言。用一個共同語言有必要嗎?為什麼?別的漢語的去向會怎麼樣?如果你使用中國的共同語言普通話,你了解這個語言的語法(比如“的, 得, 地“ 和“了” 的不同用法)嗎? 知道這個語言的基本音節(不包括聲調)只有408個嗎?

screenshot of Google Translate with the text above

Google Translate will produce this:
screenshot of Google Translate with the text above and how Google Translate puts this into Pinyin (see text below)

tán zhōng guó de“yǔ“hé” wén” de wèn tí, wǒ jué de zuì hǎo néng xiān liǎo jiè yī xià zài zhōng guó tōng yòng de yǔ yán。zhōng guó de zhǔ yào yǔ yán yǒu nǎ xiē?wéi shèn me wǒ shuō zhè ge, ér bù shuō nà gè?yīn wèi huán jìng?yīn wèi bèi qiǎng pò?yīn wèi wǒ ài zhè ge yǔ yán?yīn wèi yǒu bì yào?yīn wèi zhè ge yǔ yán hěn zhòng yào?yě xiǎng xiǎng shén me shì zhōng guó rén de gòng tóng yǔ yán。yòng yī gè gòng tóng yǔ yán yǒu bì yào ma?wèi shé me?bié de hàn yǔ de qù xiàng huì zěn me yàng?rú guǒ nǐ shǐ yòng zhōng guó de gòng tóng yǔ yán pǔ tōng huà, nǐ liǎo jiě zhè ge yǔ yán de yǔ fǎ(bǐ rú“de, de, de“ hé“le” de bù tóng yòng fǎ) ma?zhī dào zhè ge yǔ yán de jī běn yīn jié(bù bāo kuò shēng diào) zhǐ yǒu408gè ma?

Here’s what’s wrong:

  • This is all bro ken syl la bles instead of word parsing. (So it’s never even a question if they get the use of the apostrophe correct.)
  • Proper nouns are not capitalized (e.g., zhōng guó vs. Zhōngguó).
  • The first letter in each sentence is not capitalized.
  • Punctuation is not converted but remains in double-width Chinese style, which is wrong for Pinyin.
  • Spacing around most punctuation is also incorrect (e.g., although a space is added after a comma and a closing parenthesis, there’s no space after a period or a question mark. See also the spacing or lack thereof around quotation marks, numerals, etc.)
  • Because of lack of word parsing, some given pronunciations are wrong.

In my previous post I complained about Google Maps’ unfortunately botched switch to Hanyu Pinyin. I stated there that, unlike Google Maps, Google Translate would correctly produce “Chengdu” from “成都” (which it does when “translate into” is set for English). But I see that the romanization bug feature of Google Translate also fails this simple test. It generates the incorrect “chéng dōu”.

All of this indicates that Google apparently is using a poor database and not only has no idea of how Pinyin is meant to be written but also lacks an understanding of even the basic rules of Pinyin.

If you should need to use a free Web-based Pinyin converter, avoid Google Translate. Instead use Adso (from the fine folk at Popup Chinese) or perhaps NCIKU or MDBG — all of which, despite their limitations (c’mon, guys, sentences begin with capital letters), are significantly better than what Google offers.

By the way, Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I’ll leave those to others to analyze.

For lagniappe, here’s a real Hanyu Pinyin version of the text above:

Tán Zhōngguó de “yǔ” hé “wén” de wèntí, wǒ juéde zuìhǎo néng xiān liǎojiě yīxià zài Zhōngguó tōngyòng de yǔyán. Zhōngguó de zhǔyào yǔyán yǒu nǎxiē? Wèishénme wǒ shuō zhège, ér bù shuō nàge? Yīnwei huánjìng? Yīnwei bèi qiǎngpò? Yīnwei wǒ ài zhège yǔyán? Yīnwei yǒu bìyào? Yīnwei zhè ge yǔyán hěn zhòngyào? Yě xiǎngxiang shénme shì Zhōngguórén de gòngtóng yǔyán? Yòng yīge gòngtóng yǔyán yǒu bìyào ma? Weishenme? Biéde Hànyǔ de qùxiàng huì zěnmeyàng? Rúguǒ nǐ shǐyòng Zhōngguó de gòng tóng yǔyán Pǔtónghuà, nǐ liǎojiě zhège yǔyán de yǔfǎ (bǐrú “de” hé “le” de bùtóng yǒngfǎ) ma? Zhīdao zhège yǔyán de jīběn yīnjié (bù bàokuò shēngdiào) zhǐ yǒu 408 ge ma?

angling through dictionaries

The most recent rerelease from Sino-Platonic Papers is Tiao-Fish through Chinese Dictionaries (4.3 MB PDF), by Michael Carr.

The tiáo < d’ieu < *d’iôg fish, a classical Chinese happiness metaphor, has been contradictorily identified as a chub, culter, dace, eel, goby, hairtail, hemiculter, loach, mullet, paddlefish, and pike. This paper illustrates the history of Chinese lexicography by comparing tiáo definitions from thirty-five Chinese monolingual dictionaries with tiáo translation equivalents from sixteen Japanese and seventeen Western language bilingual ones.

As Carr explains, “The tiáo fish provides a historical microcosm of Chinese lexicography because every principal dictionary defines it, and because *DZIOG‘s multifarious pronunciations and writings illustrate some unique linguistic problems in Chinese dictionaries.”

This was first published in September 1993 as issue no. 40 of Sino-Platonic Papers.

some tiao fish

kanji scandal

The Kyoto-based Japan Kanji Aptitude Testing Foundation — the group behind the Kanji of the Year announcement and which runs Japan’s well-attended kanji aptitude tests — is registered as a public-interest corporation, which means that it is not supposed to generate profits greater than it needs to operate (much like a non-profit organization in the United States). On March 10, however, Japan’s Ministry of Education stepped in, saying that the foundation was making too much money and needed to overhaul its operations.

How much money are we talking about?

The foundation racked up profits of ¥880 million [US$8.8 million] in fiscal 2006 and ¥660 million in fiscal 2007. The value of its assets increased from ¥5 billion at the end of fiscal 2004 to ¥7.35 billion at the end of fiscal 2007. It would not be far-fetched to say that the foundation has created a kanji business. Kanken became a registered trademark. In fiscal 2007 alone, the foundation sold some 1.5 million copies of books. It is also providing kanji-related questions to TV shows.

But there are more problems than just how much of the money the foundation makes. It has been funneling money into companies controlled by the foundation’s director and his son, the deputy director. “In fiscal 2007, commissions to these companies amounted to 2.48 billion yen [US$24.9 million], accounting for about 40 percent of the foundation’s annual expenditures,” the Asahi Shimbun reported.

Moreover, it appears the companies did little work for the large amount of money they received.

The Ministry of Education has warned the foundation before, with not much in the way of results. The foundation is to report back to the ministry by April 15. Given how entrenched the foundation is within Japan, I don’t expect much to change.

sources:

foreign languages in NZ secondary schools

New Zealand’s Ministry of Education has released figures on secondary school enrollments in foreign languages in 2007, according to a newspaper report.

Education Ministry figures show nearly 70,000 pupils studied foreign languages at secondary schools last year, with 27,284 learning French.

Japanese was also popular (18,440), followed by Spanish (9531) and German (6623).

Chinese… attracted just 1687 pupils.

The total of those figures (63,565) seems considerably shy of “nearly 70,000.” So I suspect some languages more popular than Mandarin have been left off the list. Either way, Mandarin takes only about 2.5 percent of the total. And no indication is given of what percentage of those are “heritage” students.

That’s a lot of kids taking Japanese, though. Can anyone familiar with the situation in New Zealand comment on that?

I wasn’t able to locate the source of these figures. I did, however, find some figures from ten years ago, though they don’t include Mandarin. Also, I don’t understand the categories. But, FWIW:

Numbers of students studying second languages, July 1998

language secondary learners primary & intermediate learners
Japanese 21,701 13,625
French 20,990 8,413
German 7,377 3,877
Spanish 2,247 5,172

A few more lines from the 2008 report:

Under the new curriculum, schools must be “working toward” offering pupils in years 7 to 10 the option of learning a second language from 2011, in a push to make more Kiwis bilingual.

However, the ministry says it is up to schools and their communities to choose which languages are offered – meaning French is likely to remain popular.

A ministry spokesman said measures were underway to boost teachers’ ability to teach a variety of foreign languages in schools.

They included Maori medium scholarship and overseas exchange programmes.

sources:

further reading: