Google Translate’s new Pinyin function sucks

Google Translate has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it’s terrible, all things considered.

What Google has created is about at the same level as scripts hobbyists cobbled together the hard way about a decade ago from early versions of CE-DICT. Don’t get me wrong: I greatly admire what sites such as Ocrat achieved way back when. But for Google — with all of its data, talent, and money — to do essentially no better so many years later is nothing short of a disgrace.

To see Google Translate’s Pinyin function in action you must select “Chinese (Simplified)” or “Chinese (Traditional)” — not English — for the “Translate into” option. And then click on “Show romanization”.

For example, here’s what happens with the following text from an essay on simplified and traditional Chinese characters by Zhang Liqing:

談中國的“語”和“文”的問題,我覺得最好能先了解一下在中國通用的語言。中國的主要語言有哪些?為甚麼我說這個,而不說那個?因為環境?因為被強迫?因為我愛這個語言?因為有必要?因為這個語言很重要?也想想什麼是中國人的共同語言。用一個共同語言有必要嗎?為什麼?別的漢語的去向會怎麼樣?如果你使用中國的共同語言普通話,你了解這個語言的語法(比如“的, 得, 地“ 和“了” 的不同用法)嗎? 知道這個語言的基本音節(不包括聲調)只有408個嗎?

screenshot of Google Translate with the text above

Google Translate will produce this:
screenshot of Google Translate with the text above and how Google Translate puts this into Pinyin (see text below)

tán zhōng guó de“yǔ“hé” wén” de wèn tí, wǒ jué de zuì hǎo néng xiān liǎo jiè yī xià zài zhōng guó tōng yòng de yǔ yán。zhōng guó de zhǔ yào yǔ yán yǒu nǎ xiē?wéi shèn me wǒ shuō zhè ge, ér bù shuō nà gè?yīn wèi huán jìng?yīn wèi bèi qiǎng pò?yīn wèi wǒ ài zhè ge yǔ yán?yīn wèi yǒu bì yào?yīn wèi zhè ge yǔ yán hěn zhòng yào?yě xiǎng xiǎng shén me shì zhōng guó rén de gòng tóng yǔ yán。yòng yī gè gòng tóng yǔ yán yǒu bì yào ma?wèi shé me?bié de hàn yǔ de qù xiàng huì zěn me yàng?rú guǒ nǐ shǐ yòng zhōng guó de gòng tóng yǔ yán pǔ tōng huà, nǐ liǎo jiě zhè ge yǔ yán de yǔ fǎ(bǐ rú“de, de, de“ hé“le” de bù tóng yòng fǎ) ma?zhī dào zhè ge yǔ yán de jī běn yīn jié(bù bāo kuò shēng diào) zhǐ yǒu408gè ma?

Here’s what’s wrong:

  • This is all bro ken syl la bles instead of word parsing. (So it’s never even a question if they get the use of the apostrophe correct.)
  • Proper nouns are not capitalized (e.g., zhōng guó vs. Zhōngguó).
  • The first letter in each sentence is not capitalized.
  • Punctuation is not converted but remains in double-width Chinese style, which is wrong for Pinyin.
  • Spacing around most punctuation is also incorrect (e.g., although a space is added after a comma and a closing parenthesis, there’s no space after a period or a question mark. See also the spacing or lack thereof around quotation marks, numerals, etc.)
  • Because of lack of word parsing, some given pronunciations are wrong.

In my previous post I complained about Google Maps’ unfortunately botched switch to Hanyu Pinyin. I stated there that, unlike Google Maps, Google Translate would correctly produce “Chengdu” from “成都” (which it does when “translate into” is set for English). But I see that the romanization bug feature of Google Translate also fails this simple test. It generates the incorrect “chéng dōu”.

All of this indicates that Google apparently is using a poor database and not only has no idea of how Pinyin is meant to be written but also lacks an understanding of even the basic rules of Pinyin.

If you should need to use a free Web-based Pinyin converter, avoid Google Translate. Instead use Adso (from the fine folk at Popup Chinese) or perhaps NCIKU or MDBG — all of which, despite their limitations (c’mon, guys, sentences begin with capital letters), are significantly better than what Google offers.

By the way, Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I’ll leave those to others to analyze.

For lagniappe, here’s a real Hanyu Pinyin version of the text above:

Tán Zhōngguó de “yǔ” hé “wén” de wèntí, wǒ juéde zuìhǎo néng xiān liǎojiě yīxià zài Zhōngguó tōngyòng de yǔyán. Zhōngguó de zhǔyào yǔyán yǒu nǎxiē? Wèishénme wǒ shuō zhège, ér bù shuō nàge? Yīnwei huánjìng? Yīnwei bèi qiǎngpò? Yīnwei wǒ ài zhège yǔyán? Yīnwei yǒu bìyào? Yīnwei zhè ge yǔyán hěn zhòngyào? Yě xiǎngxiang shénme shì Zhōngguórén de gòngtóng yǔyán? Yòng yīge gòngtóng yǔyán yǒu bìyào ma? Weishenme? Biéde Hànyǔ de qùxiàng huì zěnmeyàng? Rúguǒ nǐ shǐyòng Zhōngguó de gòng tóng yǔyán Pǔtónghuà, nǐ liǎojiě zhège yǔyán de yǔfǎ (bǐrú “de” hé “le” de bùtóng yǒngfǎ) ma? Zhīdao zhège yǔyán de jīběn yīnjié (bù bàokuò shēngdiào) zhǐ yǒu 408 ge ma?

Google Maps switches to Hanyu Pinyin for Taiwan (sloppily)

Until very recently, Google Maps gave street names in Taiwan in Tongyong Pinyin — most of the time, at least. This was the case even for Taipei, which most definitely has long used Hanyu Pinyin, not Tongyong Pinyin. The romanization on Google Maps was really a hodgepodge in the maps of Taiwan. And it’s still kind of a mess; but now it’s at least more consistent — and more consistent in Hanyu Pinyin.

First the good. In Google Maps:

  • Hanyu Pinyin, not Tongyong Pinyin, is now used for street names throughout Taiwan
  • Tone marks are indicated. (Previous maps with Tongyong did not indicate tones.)

Now the bad, and unfortunately there’s a lot of it and it’s very bad indeed:

  • The Hanyu Pinyin is given as Bro Ken Syl La Bles. (Terrible! Also, this is a new style for Google Maps. Street names in Tongyong were styled properly: e.g., Minsheng, not Min Sheng.)
  • The names of MRT stations remain incorrectly presented. For example, what is referred to in all MRT stations and on all MRT maps as “NTU Hospital” is instead referred to in broken Pinyin as “Tái Dà Yī Yuàn” (in proper Pinyin this would be Tái-Dà Yīyuàn); and “Xindian City Hall” (or “Office” — bleah) is marked as Xīn Diàn Shì Gōng Suǒ (in proper Pinyin: “Xīndiàn Shìgōngsuǒ” or perhaps “Xīndiàn Shì Gōngsuǒ“). Most but not all MRT stations were already this incorrect way (in Hanyu Pinyin rather than Tongyong) in Google Maps.
  • Errors in romanization point to sloppy conversions. For example, an MRT station in Banqiao is labeled Xīn Bù rather than as Xīnpǔ. (埔 is one of those many Chinese characters with multiple Mandarin pronunciations.)
  • Tongyong Pinyin is still used in the names of most cities and townships (e.g., Banciao, not Banqiao).

Screenshot from earlier this evening, showing that Tongyong Pinyin is still being used in Google Maps for some city and district names (e.g., Gueishan, Sinjhuang, Banciao, Jhonghe, Sindian, and Jhongjheng rather than Hanyu Pinyin’s Guishan, Xinzhuang, Banqiao, Zhonghe, Xindian, and Zhongzheng, respectively).
map of Taipei area, with names as shown above

I don’t have any old screenshots of my own available at the moment, so for now I’ll refer you to an image that Fili used in an old post of his. Compare that with this screenshot I took a few minutes ago from Google Maps of the same section of Tainan:
tainan_google_maps2

Note especially how the name of the junior high school is presented.

  • Previously “Jian Xing Junior High School”.
  • Now “Jiàn Xìng Jr High School”.

This is typical of how in old maps some things were labeled (poorly) in Hanyu Pinyin. (Words, not bro ken syl la bles, are the basis for Pinyin orthography. This is a big deal, not a minor error.) And now such places are still labeled poorly in Hanyu Pinyin, but with the addition of tone marks.

I’d like to return to the point earlier on sloppy conversions. Surprisingly, 成都路 is given as “Chéng Doū Road” rather than as “Chéngdū Road“.
screenshot from Google Maps of 'Cheng Dou [sic] Rd', near Taipei's Ximending
Although “Xinpu” might not be the sort of name to be contained in some romanization databases, there is nothing in the least obscure about Chengdu, the name of a city of some 11 million people. Google Translate certainly knows the right thing to do with 成都路:
screenshot from Google Translate, showing how Google will translate '成都路' as 'Chengdu Rd'

But Google Maps doesn’t get this simple point right, which likely points to outsourcing. Why would Google do this? And why wouldn’t it ensure that a better job was done? Because, really, so far the long-overdue conversion to Hanyu Pinyin in Google Maps for Taiwan is something of a botch.

Taipei County switches to Hanyu Pinyin

Street signs in Taipei County are beginning to be changed to Hanyu Pinyin. For Pinyin supporters here, this is a long-awaited development.

Here are some examples of new signs in Banqiao, the seat of the Taipei County Government. They were taken near the Fuzhong MRT station.

street sign in Banqiao, Taiwan, in Hanyu Pinyin: 'Fuzhong Rd.' 'Chongqing Rd.'

Xianmin Blvd. Sec. 1 (This is a vertical sign, too narrow for 'Xianmin' on one line, so it's hyphenated, with 'min' on the second line)

Zhongshan Rd. Sec. 1

This is one of the Tongyong signs about to be taken down. It’s at the same intersection as the “Zhongshan” sign at above right. [November 17 update: The sign is now gone.]
JHONG SHAN RD. SEC.1

The first roads to receive these signs are large ones, especially those connecting one city to another. This is probably going to be a long, slow process, which is certainly to be expected given (a) how damn long it took them to get this started and (b) that most signs never got changed to Tongyong Pinyin during the previous administration. My impression is that most street signs in Taipei County, especially in smaller towns and on smaller roads, remain in MPS2 (the Tongyong Pinyin of the 1980s).

Has anyone noticed any changes yet in Xindian, etc.?

I wish I could provide links to official announcements, etc. But so far I haven’t been able to find any. I have, however, spoken with officials from the county government who confirm the new policy, so I’m going ahead and announcing this here.

Nice to see no InTerCaps. Unfortunately, the apostrophe situation is SNAFU, with those responsible for the signage using outdated guidelines (calling for a hyphen instead of an apostrophe). But I’ve forwarded the central government’s current rules on this to those concerned, which I hope will help get the problem fixed before any such signs go up.

-r endings, their pronunciation, and Pinyin spelling

cover of Chinese Romanization: Pronunciation and OrthographyArrr! In recognition of International Talk Like a Beijinger Pirate Day, here are the rules for how to spell those -r endings in Hanyu Pinyin and how those endings affect the pronunciation of syllables. In many cases, it’s more complicated than just adding an -r sound at the end of the standard syllable.

This information is from Yin Binyong’s Chinese Romanization: Pronunciation and Orthography. The full section from this book is available in PDF form: r- Suffixed Syllables.

Written form Actual pronunciation
-ar (mǎr, horse) -ar (mǎr)
-air (gàir, lid) -ar (gàr)
-anr (pánr, plate) -ar (pár)
-aor (bāor, bundle) -aor (bāor)
-angr (gāngr, jar) -ãr (gãr)
-or (mòr, dust) -or (mòr)
-our (hóur, monkey) -our (hóur)
-ongr (chóngr, insect) -õr (chõr)
-er (gēr, song) -er (gēr)
-eir (bèir, back) -er (bèr)
-enr (ménr, door) -er (mér)
-engr (dēngr, lamp) -ẽr (dẽr)
-ir* (zìr, Chinese character) -er (zèr)
-ir (mǐr, rice) -ier (mǐer)
-iar (xiár, box) -iar (xiár)
-ier (diér, saucer) -ier (diér)
-iaor (niǎor, bird) -iaor (niǎor)
-iur (qiúr, ball) -iour (qióur)
-ianr (diǎnr, bit) -iar (diǎr)
-iangr (qiāngr, tune) -iãr (qiãr)
-inr (xīnr, core) -ier (xīer)
-ingr (língr, bell) -iẽr (liẽr)
-iongr (xióngr, bear) -iõr (xiõr)
-ur (tùr, rabbit) -ur (tùr)
-uar (huār, flower) -uar (huār)
-uor (huór, work) -uor (huór)
-uair (kuàir, piece) -uar (kuàr)
-uir (shuǐr, water) -uer (shuěr)
-uanr (wánr, to play) -uar (wár)
-uangr (kuāngr, basket) -uãr (kuãr)
-unr (lúnr, wheel) -uer (luér)
-ür (qǔr, song) -üer (qǔer)
-üer (juér, peg) -üer (juér)
-üanr (quānr, loop) -üar (quār)
-ünr (qúnr, skirt) -üer (quér)

Notes:

  • ã, õ, ẽ indicate nasalized a, o, e.
  • The -i marked with an asterisk indicates either of the apical vowels that follow zh, ch, sh, r and z, c, s.

Pinyin with audio and Chinese characters: Fortress Besieged

cover of the book 'The Besieged City' (围城)Sinolingua‘s terrific series of abridged editions of classic Chinese books includes one of my favorites, which may well be the finest novel written in Mandarin during the twentieth century: Qian Zhongshu’s Wéichéng (圍城/围城), best known in English as Fortress Besieged but published by Sinolingua with the English title of The Besieged City.

I’m very pleased to announce that Pinyin.info now offers the first chapter of Sinolingua’s edition this book, along with an audio file of it being read aloud. This edition is in Mandarin, in word-parsed Hanyu Pinyin (with Chinese characters underneath) and has a few notes in English as well as mp3 files of the text being read aloud.

Here’s the download page: Wéichéng (圍城/围城/).

I’ve often told people who plan to go to China and want me to recommend a book that will help them “understand” the country (as if!) they’re about to visit: “By all means, read the Analects of Confucius, the Dàodéjīng, and the Zhuāngzǐ; but know in advance that they’ll be about as relevent to your trip as reading the Gospels would be to someone from China who’s about to travel to the West for the first time. And don’t waste your time with crap like The Tao of the Chinese Boardroom’s Inner Art of Feng-shui or whatever. Read Fortress Besieged. It’s as good a start as just about anything — and a lot more fun to read.”

The novel is also available in a fine English translation.

Related reading:

screenshot of part of a paragraph of the PDF of this book

Web pages with Mandarin text to speech

the Chinese character '?' and with the pinyin 'niàn' above itMy recent addition to this site of Mandarin text with audio brought to mind the issue of text-to-speech for Mandarin.

Here are some Web pages that allow you to input texts (albeit very brief ones in most cases) in Chinese characters and hear them pronounced in Mandarin and, in a few instances, Cantonese as well.

  • Oddcast (Sitepal). Although one of the options is for “Taiwanese,” texts are not read in that language (Hoklo) but rather in Mandarin.
  • Cling
  • Sinovoice. Be sure to enter the “code” number or the text won’t be spoken aloud.
  • Ekho
  • Iflytek. This is is particularly interesting because it can add Hanyu Pinyin above the Hanzi that are being read. Unfortunately, this does not work in Opera; but Firefox and IE are OK.

Does anyone have any favorites?

Ba Jin in Pinyin, with audio

illustration of two young men under an umbrella -- from Ba Jin's 'Family'This bit of news is simply wonderful. As part of Sinolingua‘s Abridged Chinese Classic Series, all three volumes in Bā Jīn‘s “torrents” trilogy (Jīliú sānbùqǔ / 激流三部曲) are now available in abridged editions in word-parsed Hanyu Pinyin (with Chinese characters underneath), along with a few notes in English and mp3 files of the text being read aloud.

These books would make great material for those who are

  • studying Mandarin
  • trying to memorize Chinese characters
  • learning Hanyu Pinyin
  • wanting to read something in Mandarin that isn’t too damn hard but isn’t a children’s book either
  • looking for something to read in Mandarin that doesn’t require much or even any knowledge of Chinese characters (ABCs and other “overseas Chinese,” take note!)

Through the generousity of the publisher, Pinyin.info now offers sample chapters from each of these three classics of twentieth-century Chinese literature along with audio files of the text being read aloud.

I’m very pleased to offer samples from these books on this site and hope these editions will be enjoyed by many readers worldwide and become standard texts in many classrooms.

Taiwan train stations and the switch to Hanyu Pinyin

Although Hanyu Pinyin has been Taiwan’s official romanization system since the beginning of this year, progress in implementation on signage has so far been little to none (at least in what I’ve witnessed). So I was pleased to see this sign earlier this week at the remodeled train station in Zhunan, Miaoli County.

sign atop train station reading 'ZHUNAN STATION' in large letters

Those big letters unmistakably spell out the name of the city in Hanyu Pinyin. Good.

But what about the use of romanization inside the station? Here’s a shot of part of a board listing the stations near Zhunan.

train_station_names

Let’s look at the systems used in the names above:

  • Xinfong — Hanyu Pinyin and Tongyong Pinyin mix
  • Zhubei — Hanyu Pinyin
  • Hsinchu — Wade-Giles
  • Xiangshan — Hanyu Pinyin
  • Qiding — Hanyu Pinyin (BTW, that’s a terrible Q, as it’s too little distinct from an O, especially at a distance.)
  • Zhunan — Hanyu Pinyin
  • Zaociao — Tongyong Pinyin
  • Fongfu — Tongyong Pinyin
  • Miaoli — same in most systems

Once again we see the government’s incompetence when it comes to such simple things as spelling names correctly on signage.

But since at least “Zhunan” was right, what about signage for the same name beyond the train station?

Well, there’s still Tongyong Pinyin (“Jhunan”):

directional sign reading 'Central Jhunan'

And there’s still Tongyong’s predecessor, MPS2 (“Junan”), along with other systems, typos, and sloppy English:

signs reading 'Junan', 'West Sea Shore Highway.', 'Lung-Shan Rd.', and 'Chi Ding Bathing Beath.'

And there are still spellings that are simply wrong (“Jhuan”), regardless of the system:
directional sign above the highway, reading 'Jhuan Brewery'

I’ve said it before, I’ll say it again: “Taiwan’s romanization situation: plus ça change, plus c’est la même chose.”