Google Translate’s new Pinyin function sucks

Google Translate has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it’s terrible, all things considered.

What Google has created is about at the same level as scripts hobbyists cobbled together the hard way about a decade ago from early versions of CE-DICT. Don’t get me wrong: I greatly admire what sites such as Ocrat achieved way back when. But for Google — with all of its data, talent, and money — to do essentially no better so many years later is nothing short of a disgrace.

To see Google Translate’s Pinyin function in action you must select “Chinese (Simplified)” or “Chinese (Traditional)” — not English — for the “Translate into” option. And then click on “Show romanization”.

For example, here’s what happens with the following text from an essay on simplified and traditional Chinese characters by Zhang Liqing:

談中國的“語”和“文”的問題,我覺得最好能先了解一下在中國通用的語言。中國的主要語言有哪些?為甚麼我說這個,而不說那個?因為環境?因為被強迫?因為我愛這個語言?因為有必要?因為這個語言很重要?也想想什麼是中國人的共同語言。用一個共同語言有必要嗎?為什麼?別的漢語的去向會怎麼樣?如果你使用中國的共同語言普通話,你了解這個語言的語法(比如“的, 得, 地“ 和“了” 的不同用法)嗎? 知道這個語言的基本音節(不包括聲調)只有408個嗎?

screenshot of Google Translate with the text above

Google Translate will produce this:
screenshot of Google Translate with the text above and how Google Translate puts this into Pinyin (see text below)

tán zhōng guó de“yǔ“hé” wén” de wèn tí, wǒ jué de zuì hǎo néng xiān liǎo jiè yī xià zài zhōng guó tōng yòng de yǔ yán。zhōng guó de zhǔ yào yǔ yán yǒu nǎ xiē?wéi shèn me wǒ shuō zhè ge, ér bù shuō nà gè?yīn wèi huán jìng?yīn wèi bèi qiǎng pò?yīn wèi wǒ ài zhè ge yǔ yán?yīn wèi yǒu bì yào?yīn wèi zhè ge yǔ yán hěn zhòng yào?yě xiǎng xiǎng shén me shì zhōng guó rén de gòng tóng yǔ yán。yòng yī gè gòng tóng yǔ yán yǒu bì yào ma?wèi shé me?bié de hàn yǔ de qù xiàng huì zěn me yàng?rú guǒ nǐ shǐ yòng zhōng guó de gòng tóng yǔ yán pǔ tōng huà, nǐ liǎo jiě zhè ge yǔ yán de yǔ fǎ(bǐ rú“de, de, de“ hé“le” de bù tóng yòng fǎ) ma?zhī dào zhè ge yǔ yán de jī běn yīn jié(bù bāo kuò shēng diào) zhǐ yǒu408gè ma?

Here’s what’s wrong:

  • This is all bro ken syl la bles instead of word parsing. (So it’s never even a question if they get the use of the apostrophe correct.)
  • Proper nouns are not capitalized (e.g., zhōng guó vs. Zhōngguó).
  • The first letter in each sentence is not capitalized.
  • Punctuation is not converted but remains in double-width Chinese style, which is wrong for Pinyin.
  • Spacing around most punctuation is also incorrect (e.g., although a space is added after a comma and a closing parenthesis, there’s no space after a period or a question mark. See also the spacing or lack thereof around quotation marks, numerals, etc.)
  • Because of lack of word parsing, some given pronunciations are wrong.

In my previous post I complained about Google Maps’ unfortunately botched switch to Hanyu Pinyin. I stated there that, unlike Google Maps, Google Translate would correctly produce “Chengdu” from “成都” (which it does when “translate into” is set for English). But I see that the romanization bug feature of Google Translate also fails this simple test. It generates the incorrect “chéng dōu”.

All of this indicates that Google apparently is using a poor database and not only has no idea of how Pinyin is meant to be written but also lacks an understanding of even the basic rules of Pinyin.

If you should need to use a free Web-based Pinyin converter, avoid Google Translate. Instead use Adso (from the fine folk at Popup Chinese) or perhaps NCIKU or MDBG — all of which, despite their limitations (c’mon, guys, sentences begin with capital letters), are significantly better than what Google offers.

By the way, Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I’ll leave those to others to analyze.

For lagniappe, here’s a real Hanyu Pinyin version of the text above:

Tán Zhōngguó de “yǔ” hé “wén” de wèntí, wǒ juéde zuìhǎo néng xiān liǎojiě yīxià zài Zhōngguó tōngyòng de yǔyán. Zhōngguó de zhǔyào yǔyán yǒu nǎxiē? Wèishénme wǒ shuō zhège, ér bù shuō nàge? Yīnwei huánjìng? Yīnwei bèi qiǎngpò? Yīnwei wǒ ài zhège yǔyán? Yīnwei yǒu bìyào? Yīnwei zhè ge yǔyán hěn zhòngyào? Yě xiǎngxiang shénme shì Zhōngguórén de gòngtóng yǔyán? Yòng yīge gòngtóng yǔyán yǒu bìyào ma? Weishenme? Biéde Hànyǔ de qùxiàng huì zěnmeyàng? Rúguǒ nǐ shǐyòng Zhōngguó de gòng tóng yǔyán Pǔtónghuà, nǐ liǎojiě zhège yǔyán de yǔfǎ (bǐrú “de” hé “le” de bùtóng yǒngfǎ) ma? Zhīdao zhège yǔyán de jīběn yīnjié (bù bàokuò shēngdiào) zhǐ yǒu 408 ge ma?

Web pages with Mandarin text to speech

the Chinese character '?' and with the pinyin 'niàn' above itMy recent addition to this site of Mandarin text with audio brought to mind the issue of text-to-speech for Mandarin.

Here are some Web pages that allow you to input texts (albeit very brief ones in most cases) in Chinese characters and hear them pronounced in Mandarin and, in a few instances, Cantonese as well.

  • Oddcast (Sitepal). Although one of the options is for “Taiwanese,” texts are not read in that language (Hoklo) but rather in Mandarin.
  • Cling
  • Sinovoice. Be sure to enter the “code” number or the text won’t be spoken aloud.
  • Ekho
  • Iflytek. This is is particularly interesting because it can add Hanyu Pinyin above the Hanzi that are being read. Unfortunately, this does not work in Opera; but Firefox and IE are OK.

Does anyone have any favorites?

v for ü

Typing the letter v to produce ü is pretty standard in most Pinyin-related software — the letter v not being used in Pinyin except for loan words, and the letter ü not being found on traditional qwerty keyboards.

Here’s an official sign not far from Tian’anmen Square in Beijing that provides an example of an unconverted v.

official directional sign reading '?????? ZHINVQIAODONGHEYAN' in white letters against a blue background

Of course there’s the usual word-parsing trouble as well, which can indeed be tricky in some cases (but not so much that everythingneedstobewrittensolidlikethis).

This should be “Zh?n? Qiáo d?ng héyán” (?????? / ?????? / Weaver Girl’s Bridge, east bank) or perhaps “Zhinü Qiao Dong Heyan” or “ZHINÜ QIAO DONG HEYAN”.

Some people might not think this is worth categorizing as a problem. My position, however, is that government has an obligation to write things properly on its official signage. (If this were on some ad hoc sign put up privately it would still be interesting but less problematic.) So, if anyone’s OK with the V, would you also be OK with, say, “??????”?

OTOH, as mistakes go, at least v remains distinct, unlike when ü gets incorrectly written as u, which is so common in Taiwan that I don’t recall ever having seen a ü on official signage. (Pinyin has the following distinct pairs: and nu, and lu; nüe (rare) and lüe are also used but not nue or lue since the latter two sounds are not used in modern standard Mandarin.

major updates to Chinese KEY

key_softwareIf you are using one or more programs from the Chinese Key family of software, you should definitely update if you haven’t in the past few months, as some significant improvements have been made.

One of the things I particularly like about Key is that it has the rare virtue of following proper Pinyin orthography. So if you’re not familiar with it, you might want to give it or one of its sibling programs a 30-day test drive.

No, I get no kickbacks from the company; I just admire the software.

new tools for writing Pinyin

I’ve received word from software writers of not one but two useful new tools for writing Hanyu Pinyin with tone marks (i.e., not using Pinyin to enter Chinese characters but really writing Hanyu Pinyin texts).

P?ny?n Editor, by Bengt Moss-Petersen, is an online tool that currently works best with IE 6+ and Firefox.

click to visit the online Pinyin editor

(I made text much larger than the default size, since I had to reduce the image to make it fit in my blog. Users can choose among several sizes and fonts.)

And Pinyin Builder, by Wayne Kirk, is freeware for Windows systems.

click to visit the download page for Pinyin Builder

If you have an open Microsoft Office document, clicking Pinyin Builder’s “GO” button will insert your Pinyin text into that document. You don’t need to bother with copying and pasting.

In both of these, ü + tone mark is produced by v + tone number. Pinyin Builder also offers a combination using the CTRL key.

The tone number can be entered either immediately after the vowel or later in the syllable (e.g., zho1ng, zhong1, and zhon1g all yield “zh?ng”). Pinyin Editor also offers the option to simply click on buttons with the vowels and tone marks.

I hope people make frequent use of both of these terrific new tools.


convert Chinese characters to Unicode character references: javascript

I’ve had a spate of requests recently for the code for’s tool that converts Chinese characters to Unicode numeric character references (i.e., something that converts, say, “漢語拼音” into “漢語拼音”). Since I’m a believer in open-source work — and since people could find the code anyway if they look carefully enough in the Web page’s source code — I might as well publish it.

This tool can be very handy when making Web pages that use a variety of scripts. (It works on Cyrillic, etc., as well.) I often employ it myself.

Here’s the heart of the code:

function convertToEntities() {
var tstr = document.form.unicode.value;
var bstr = '';
for(i=0; i<tstr.length; i++)
bstr += '&#' + tstr.charCodeAt(i) + ';';
bstr += tstr.charAt(i);
document.form.entity.value = bstr;

This sleek little bit of Javascript is originally by Steve Minutillo and used here on with his permission. I may have tweaked the code a little myself; but that was so long ago I don’t remember well. (I’ve had the converter here for about five years.) Anyway, if you use this please acknowledge Steve’s authorship; and of course I always greatly appreciate links back to

If anyone knows how to do the same thing in PHP — preferably with no more code than used above, please let me know.

See also: separating Pinyin syllables: PHP code.

software for Shanghainese

Professor Qián Nǎiróng (Qian Nairong / 錢乃榮) of Shanghai University has just issued free software to help with the writing of Shanghainese (上海话). People may now download the 1.3 MB zip file of the program.

Some examples:

shanghe 上海
shanghehhehho 上海闲/言话(上海话)
whangpugang 黄浦江
shyti 事体(事情)
makshy 物事(东西)
bhakxiang 白相(玩)
dangbhang 打朋(开玩笑)
ghakbhangyhou 轧朋友(交朋友)
cakyhangxiang 出洋相(闹笑话,出丑)
linfhakqin 拎勿清(不能领会)
dhaojiangwhu 淘浆糊(混)
aoshaoxhin 拗造型(有意塑造姿态形象)
ghe 隑(靠)
kang 囥(藏)
yin 瀴(凉、冷)
dia 嗲
whakji 滑稽

The program offers two flavors of romanization. Here are some examples of the differences between the two styles:

New Folk Old Timers
makshy 物事(东西)
bhakxiang 白相(玩)
dangbhang 打朋(开玩笑)
ghakbhangyhou 轧朋友(交朋友)
cakyhangxiang 出洋相(闹笑话,出丑)
linfhakqin 拎勿清(不能领会)
mekshy 物事(东西)
bhekxian 白相(玩)
danbhan 打朋(开玩笑)
ghakbhanyhou 轧朋友(交朋友)
cekyhanxian 出洋相(闹笑话,出丑)
linfhekqin 拎勿清(不能领会)

Here’s a brief story on this:

Xiànzài, wǒmen zài wǎngluò zhōng liáotiān de shíhou yuèláiyuè duō de péngyou dōu kāishǐ xǐhuan yòng Shànghǎihuà. Dànshì yǒushíhou shìbushì juéde xiǎng biǎodá dehuà bùzhīdào zěnme dǎ, nòng de yǒudiǎn bùlúnbùlèi ne? Xiànzài, yī ge kěyǐ qīngsōng dǎchū Shànghǎihuà de chéngxù chūlai le.

Jīngguò liǎng nián nǔlì, Shànghǎi dàxué Zhōngwénxì Qián Nǎiróng jiàoshòu jí tā de yánjiūshēng hé dādàng zhōngyú yú běnyuè wánchéng le Shànghǎihuà shūrùfǎ de zhìzuò. Zhíde guānzhù de shì, zhè tào shūrùfǎ hái bāokuò xīn-lǎo liǎng ge bǎnběn, 45 suì yǐshàng de lǎo Shànghǎi rénhé niánqīng yī dài de Shànghǎirén dōu kěyǐ zhǎodào zìjǐ de “dǎfǎ.”

Háishi tóngyàng 26 ge zìmǔ de jiànpán, 8 yuè 1 rì qǐ xiàzài le Shànghǎihuà shūrùfǎ zhīhòu, nín jiù kěyǐ tōngguò shūrù “linfhakqin” dǎchū “līn wù qīng,” shūrù “dhaojiangwhu” dǎchū “táo jiànghu” děng yuánzhī yuán wèi de Shànghǎihuà le. Zuótiān, jìzhě tíqián xiàzài dào gāi ruǎnjiàn. Ànzhào shǐyòng shuōmíng, yòng quánpīn de fāngshì chángshì shūrù “laoselaosy” zhèxiē zìmǔ, píngmù shàng, lìjí chūxiàn le “lǎo sānlǎo sì” (Shànghǎihuà, yìsi shì “màilǎo, chōng lǎochéng de yàngzi”).

Jùxī, yóuyú Shànghǎihuà yǔ Pǔtōnghuà de dúfǎ yǒusuǒbùtóng, suǒyǐ zài pīnyīn pīnxiě fāngshì shàng háishi xūyào shǐyòng shuōmíng de bāngzhù. Bǐrú jìzhě fāxiàn, fánshì yǔ Pǔtōnghuà shēngmǔ, yùnmǔ xiāngtóng de zì, zài Shànghǎihuà shūrùfǎ zhōng zuìzhōng yòng de háishi Pǔtōnghuà pīnyīn, bùtóng de zé cǎiyòng Shànghǎihuà shūrùfǎ de pīnxiě fāngshì. Rú “chénguāng” de “chén,” “huātou” de “tóu” dōu fāchéng zhuóyīn, Shànghǎihuà pīnyīn shūrùfǎ zhōng yàozài shēngmǔ zhōng jiā yī ge zìmǔ h, pīnchéng “shen,” “dhou;” fánshì rùshēng zì, zé zài pīnyīn hòu jiā zìmǔk, rú “báixiāng” de “bái” jiù pīnchéng bhek.

Bùguò, dàjiā bùyào juéde tài nán. Jìzhě fāxiàn, Shànghǎihuà shūrùfǎ yǔ Pǔtōnghuà de shūrùfǎ zuìdà xiāngtóng zhī chǔzài yú, zhǐyào liánxù shūrù shēngmǔ hé yùnmǔ jiù kěyǐ, bùxū shūrù shēngdiào. Cǐwài, Shànghǎihuà pīnyīn shūrù xìtǒng háiyǒu lèisì “zhìnéng” yōudiǎn, kěyòng suōlüè fāngshì bǎ cíyǔ pīnxiě chūlai.

Zhǔchí Shànghǎihuà shūrùfǎ kāifā de Shànghǎi dàxué Zhōngwénxì Qián Nǎiróng jiàoshòu gàosu jìzhě, zhè tào shūrùfǎ bùjǐn néng dǎchū Shànghǎihuà dà cídiǎn zhōng 15,000 duō ge cítiáo, érqiě hái néng yòng Shànghǎihuà pīnyīn dǎchū Shànghǎihuà zhōng shǐyòng zhe de, yǔ Pǔtōnghuà cíyì xiāngtóng dàn yǔyīn bùtóng de chángyòng cíyǔ. Rú “Huángpǔ Jiāng” shūrù “whangpugang” , “lǐxiǎng” zéshì lixiang děng, gòngjì 10,000 duō ge cítiáo.


Find Chinese characters online by drawing them with your mouse

Nciku, a Web site that bills itself as “more than a dictionary,” has a nifty feature that allows users to find Chinese characters by drawing them with a mouse.

interface for the character-drawing tool

As you draw, possible character matches will appear in the box to the right of your drawing, with the results refined as your drawing progresses. You don’t need to know the canonical stroke order to get this to work, nor do your calligraphy skills need to be perfect, as this example shows.
, showing the results with a sloppily drawn ? (the 'pin' of 'Pinyin')

Once you see the correct character offered as a choice, click on it and it will be entered into the search box for the site’s online dictionary. This dictionary feature can handle multiple-character input and will even prompt you with likely choices to fill out your search.

