Google Translate has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it’s terrible, all things considered.
What Google has created is about at the same level as scripts hobbyists cobbled together the hard way about a decade ago from early versions of CE-DICT. Don’t get me wrong: I greatly admire what sites such as Ocrat achieved way back when. But for Google — with all of its data, talent, and money — to do essentially no better so many years later is nothing short of a disgrace.
To see Google Translate’s Pinyin function in action you must select “Chinese (Simplified)” or “Chinese (Traditional)” — not English — for the “Translate into” option. And then click on “Show romanization”.
For example, here’s what happens with the following text from an essay on simplified and traditional Chinese characters by Zhang Liqing:
談ä¸åœ‹çš„“語â€å’Œâ€œæ–‡â€çš„å•題,我覺得最好能先了解一下在ä¸åœ‹é€šç”¨çš„語言。ä¸åœ‹çš„主è¦èªžè¨€æœ‰å“ªäº›ï¼Ÿç‚ºç”šéº¼æˆ‘說這個,而ä¸èªªé‚£å€‹ï¼Ÿå› ç‚ºç’°å¢ƒï¼Ÿå› ç‚ºè¢«å¼·è¿«ï¼Ÿå› ç‚ºæˆ‘æ„›é€™å€‹èªžè¨€ï¼Ÿå› ç‚ºæœ‰å¿…è¦ï¼Ÿå› 為這個語言很é‡è¦ï¼Ÿä¹Ÿæƒ³æƒ³ä»€éº¼æ˜¯ä¸åœ‹äººçš„å…±åŒèªžè¨€ã€‚用一個共åŒèªžè¨€æœ‰å¿…è¦å—Žï¼Ÿç‚ºä»€éº¼ï¼Ÿåˆ¥çš„æ¼¢èªžçš„åŽ»å‘æœƒæ€Žéº¼æ¨£ï¼Ÿå¦‚æžœä½ ä½¿ç”¨ä¸åœ‹çš„å…±åŒèªžè¨€æ™®é€šè©±ï¼Œä½ 了解這個語言的語法(比如“的, 得, 地“ 和“了†的ä¸åŒç”¨æ³•)嗎? 知é“這個語言的基本音節(ä¸åŒ…括è²èª¿ï¼‰åªæœ‰408個嗎?

Google Translate will produce this:

tán zhÅng guó de“yǔ“hé†wén†de wèn tÃ, wÇ’ jué de zuì hÇŽo néng xiÄn liÇŽo jiè yÄ« xià zà i zhÅng guó tÅng yòng de yÇ” yán。zhÅng guó de zhÇ” yà o yÇ” yán yÇ’u nÇŽ xiē?wéi shèn me wÇ’ shuÅ zhè ge, ér bù shuÅ nà gè?yÄ«n wèi huán jìng?yÄ«n wèi bèi qiÇŽng pò?yÄ«n wèi wÇ’ à i zhè ge yÇ” yán?yÄ«n wèi yÇ’u bì yà o?yÄ«n wèi zhè ge yÇ” yán hÄ›n zhòng yà o?yÄ› xiÇŽng xiÇŽng shén me shì zhÅng guó rén de gòng tóng yÇ” yán。yòng yÄ« gè gòng tóng yÇ” yán yÇ’u bì yà o ma?wèi shé me?bié de hà n yÇ” de qù xià ng huì zÄ›n me yà ng?rú guÇ’ nÇ shÇ yòng zhÅng guó de gòng tóng yÇ” yán pÇ” tÅng huà , nÇ liÇŽo jiÄ› zhè ge yÇ” yán de yÇ” fǎ(bÇ rú“de, de, de“ hé“le†de bù tóng yòng fǎ) ma?zhÄ« dà o zhè ge yÇ” yán de jÄ« bÄ›n yÄ«n jié(bù bÄo kuò shÄ“ng dià o) zhÇ yÇ’u408gè ma?
Here’s what’s wrong:
- This is all bro ken syl la bles instead of word parsing. (So it’s never even a question if they get the use of the apostrophe correct.)
- Proper nouns are not capitalized (e.g., zhÅng guó vs. ZhÅngguó).
- The first letter in each sentence is not capitalized.
- Punctuation is not converted but remains in double-width Chinese style, which is wrong for Pinyin.
- Spacing around most punctuation is also incorrect (e.g., although a space is added after a comma and a closing parenthesis, there’s no space after a period or a question mark. See also the spacing or lack thereof around quotation marks, numerals, etc.)
- Because of lack of word parsing, some given pronunciations are wrong.
In my previous post I complained about Google Maps’ unfortunately botched switch to Hanyu Pinyin. I stated there that, unlike Google Maps, Google Translate would correctly produce “Chengdu” from “æˆéƒ½” (which it does when “translate into” is set for English). But I see that the romanization bug feature of Google Translate also fails this simple test. It generates the incorrect “chéng dÅu“.
All of this indicates that Google apparently is using a poor database and not only has no idea of how Pinyin is meant to be written but also lacks an understanding of even the basic rules of Pinyin.
If you should need to use a free Web-based Pinyin converter, avoid Google Translate. Instead use Adso (from the fine folk at Popup Chinese) or perhaps NCIKU or MDBG — all of which, despite their limitations (c’mon, guys, sentences begin with capital letters), are significantly better than what Google offers.
By the way, Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I’ll leave those to others to analyze.
For lagniappe, here’s a real Hanyu Pinyin version of the text above:
Tán ZhÅngguó de “yǔ†hé “wén†de wèntÃ, wÇ’ juéde zuìhÇŽo néng xiÄn liÇŽojiÄ› yÄ«xià zà i ZhÅngguó tÅngyòng de yÇ”yán. ZhÅngguó de zhÇ”yà o yÇ”yán yÇ’u nÇŽxiÄ“? Wèishénme wÇ’ shuÅ zhège, ér bù shuÅ nà ge? YÄ«nwei huánjìng? YÄ«nwei bèi qiÇŽngpò? YÄ«nwei wÇ’ à i zhège yÇ”yán? YÄ«nwei yÇ’u bìyà o? YÄ«nwei zhè ge yÇ”yán hÄ›n zhòngyà o? YÄ› xiÇŽngxiang shénme shì ZhÅngguórén de gòngtóng yÇ”yán? Yòng yÄ«ge gòngtóng yÇ”yán yÇ’u bìyà o ma? Weishenme? Biéde Hà nyÇ” de qùxià ng huì zÄ›nmeyà ng? RúguÇ’ nÇ shÇyòng ZhÅngguó de gòng tóng yÇ”yán PÇ”tónghuà , nÇ liÇŽojiÄ› zhège yÇ”yán de yÇ”fÇŽ (bÇrú “de†hé “le†de bùtóng yÇ’ngfÇŽ) ma? ZhÄ«dao zhège yÇ”yán de jÄ«bÄ›n yÄ«njié (bù bà okuò shÄ“ngdià o) zhÇ yÇ’u 408 ge ma?
Chitsaou said
I think they should at least implement this feature with Chinese Word Breaking API. Yahoo! Taiwan provides such API named æ–·ç« å–義, and Mac OS X Leopard also comes with this kind of library (different from Yahoo!’s ) …
dhd said
It’s even more astounding when you consider that word segmentation is a basic component of any Chinese language processing system. So it’s not like google doesn’t know how to do word passing. And it’s also not like their speech recognition group doesn’t have a pronunciation dictionary for Mandarin. Total fail…
dhd said
By which I mean word parsing…
Pinyin Expert Mark Swofford Rips Google Translate Pinyin New Ass, Recommends Adso - Official Dofufa Blog - said
[...] Swofford berates Google Translate’s new Pinyin/Romanization feature while praising David Lancashire’s Adsotrans [...]
Watching Google Lose China in Realtime said
[...] in full detail, so if you’re not familiar with the challenges involved I’d encourage reading his dissection. I don’t think the service is actually that bad (testing with verb and noun usages of æ•° [...]
Nicholas Van Orton said
Dear Pinyin Info,
You need to fix quite a few tone marks in the last third of the text. Until you do, Google Translate has a clear edge over you in that section.
And should the word “putonghua” really be capitalized in pinyin? If it should, I wonder if this reflects influence from English.
Jens said
A corollary to this lack of word parsing is that there’s no indication of neutral tones. I mean, it’s bad enough to not parse words and then show the wrong sound altogether, but not indicating neutral tones is critically bad as well, in my view.
Pinyin Info said
Yet another example of why word parsing is so important: While looking for something unrelated I noticed that Google renders Chóngqìng (釿…¶) as Zhòng Qìng.
So not just Chengdu but also Chongqing. Does Google have something against Sichuan?
It appears that Kingwaytek Technology (QÃnwÇŽi KÄ“jì GÇ”fèn YÇ’uxià n GÅngsÄ« / 勤崴科技股份有é™å…¬å¸), a company in Taipei, is responsible for some of this.
Nongandwong said
Sometimes I wonder if it’s not poorly done on purpose, if the problems are sorted out, there won’t be much point in learning more than a few characters to read Chinese online you will be able to just convert things back and forth.
BTW who can you actually complain to about these things? Google’s pinyin IME is great because you can toggle from simplified to traditional characters very easily, but it was probably made by PRC Chinese, because (like most PRC Chinese) it can’t tell the difference between “face” é¢ and “noodles” 麵 “mile” 里 and “inside” è£. I’d like to inform them of the problem.
Austin said
The Japanese transliteration suffers the same problems; while Japanese has much more mutable pronunciation of kanji and so I haven’t seen any pronunciation errors like Chengdu, I presume it would stumble on some more esoteric combinations (particularly names, which can have multiple valid pronunciations for the same characters).
Google parsed words in the sample I put in very strangely; “daigaku” was together, but “niban” was not; it had no problems with “nippon” (not sure why it didn’t use nihon, but that’s another discussion) but separated arimasen-deshita, a normal polite-past verb conjugation, into “ari mase n deshi ta”.
The one thing I thought it would have issues with, particle “wa” verses the syllable “ha,” which use the same kana (ã¯), it seemed to process just fine. (e vs he is fine too)
It transliterates particle ã‚’ as “wo,” which is fine for speakers but might give others the wrong idea.
hsknotes said
Nongandwong:
maybe you hadn’t heard, but “google’s” pinyin ime is little more than a total theft of the sogou ime.
Kai Carver said
Google gets the Chengdu romanization right in one simple translation case:
http://translate.google.com/translate_t?text=Chengdu&sl=en&tl=zh-TW
(right, except for capitalization and syllable breakage: “chéng dÅ«”)
but wrong in most others:
http://translate.google.com/translate_t?text=I+live+in+Chengdu&sl=en&tl=zh-TW
When I saw this half-nice Google feature, I immediately thought, poor Mark, he’s battled the government of Taiwan, now he has to take on Google… Discouraging… But jiÄ yoú!
Kai Carver said
> To see Google Translate’s Pinyin function in action
> you must select “Chinese (Simplified)†or “Chinese (Traditional)â€
> — not English — for the “Translate into†option.
Actually there’s a way to go directly from Chinese characters to romanization:
if you specify English as the source language, and Chinese/Korean/Hindi/Russian/etc. as the target, but enter Chinese/Korean/Hindi/Russian/etc. characters, you get the romanization (but still have to click for it):
* Mandarin
* Korean
* Hindi
* Russian
You can just click on the “Swap languages” icon to get a translation.
(Darn, no Arabic or Hebrew).
Steve said
Wow, yeah, this is pretty bad. I could have sat down on a free afternoon and written a script that does nearly this well, and I don’t have millions of dollars and some of the world’s best computational linguists at my disposal. I really would expect a lot more out of Google.
Nongandwong said
Looks like they stole all the mistakes too…
Chinese through song « scribbles said
[...] http://www.popupchinese.com/tools/adso. It breaks it up into WORDS for you. Pinyin.info recommends [...]
Frank said
The latest version shows:
“Tán zhÅngguó de “yǔ†hé “wén†de wèntÃ, wÇ’ juéde zuì hÇŽo néng xiÄn liÇŽo jiè yÄ«xià zà i zhÅngguó tÅngyòng de yÇ”yán.ZhÅngguó de zhÇ”yà o yÇ”yán yÇ’u nÇŽxiÄ“?Wéishènme wÇ’ shuÅ zhège, ér bù shuÅ nà gè?YÄ«nwèi huánjìng?YÄ«nwèi bèi qiÇŽngpò?YÄ«nwèi wÇ’ à i zhège yÇ”yán?YÄ«nwèi yÇ’u bìyà o?YÄ«nwèi zhège yÇ”yán hÄ›n zhòngyà o?YÄ› xiÇŽng xiÇŽng shénme shì zhÅngguó rén de gòngtóng yÇ”yán.Yòng yÄ«gè gòngtóng yÇ”yán yÇ’u bìyà o ma?Wèishéme?Bié de hà nyÇ” de qùxià ng huì zÄ›nme yà ng?RúguÇ’ nÇ shÇyòng zhÅngguó de gòngtóng yÇ”yán pÇ”tÅnghuà , nÇ liÇŽojiÄ› zhège yÇ”yán de yÇ”fÇŽ (bÇrú “de, de, de “hé “le†de bùtóng yòngfÇŽ) ma?ZhÄ«dà o zhège yÇ”yán de jÄ«bÄ›n yÄ«njié (bù bÄokuò shÄ“ngdià o) zhÇyÇ’u 408 gè ma?”
Pinyin Info said
Many thanks for alerting us to the update, Frank.
Google still needs to capitalize proper nouns, correct spacing after punctuation, and work on its word parsing a little. But, still, what it produces now is an enormous improvement over its initial effort.
I’m very happy to see this and am hopeful that Google may actually get this right soon.
Frank said
I think the “correct spacing after punctuation” issue was fixed now.
Pinyin news » Google Translate’s Pinyin converter revisited said
[...] Google Translate‘s Pinyin converter was first released about a year and a half ago, it sucked. Wow, did it ever suck. Since then, however, Google has instituted some changes. So it seems about [...]
Student10 said
Looks like the pinyin has totally disappeared from Translate now. So congratulations are in order to all you erudite critics.
Now we beginners are working directly from chinese Characters and have to rely on our defective ears to hear the subtle differences in spoken mandarin. Stale bread was better than no bread at all.
Pinyin Info said
Bwahaha! The mighty power of this website and other forces in the wicked erudition movement have finally crushed the will of puny little multi-billion-dollar Google!
Um, no.
Try clicking on the “Ä” in the bottom-right corner of Google Translate’s input box.
And, anyway, the bread’s no longer so stale, though it could still use some improvement.
You’re welcome.
carlos said
Thanks for the article but honestly, I think it is a great tool that will definately improve with the time.
Pinyin Info said
As I wrote in comments above, Google has since made some improvements:
Google Translate’s Pinyin converter revisited
Google Translate’s Pinyin converter: now with apostrophes
I’m closing the comments here.