Google Translate’s Pinyin converter: now with apostrophes

Google has taken another major step toward making Google Translate‘s Pinyin converter decent. Finally, apostrophes.

Not long ago “阿爾巴尼亞然而仁愛蓮藕普洱茶” would have yielded “Āěrbāníyǎ ránér rénài liánǒu pǔěr chá.” But now Google produces the correct “Ā’ěrbāníyǎ rán’ér rén’ài lián’ǒu pǔ’ěr chá.” (Well, one could debate whether that last one should be pǔ’ěr chá, pǔ’ěrchá, Pǔ’ěr chá, Pǔ’ěr Chá, or Pǔ’ěrchá. But the apostrophe is undoubtedly correct regardless.)

Also, the -men suffix is now solid with words (e.g., 朋友們 –> péngyoumen and 孩子們 –> háizimen). This is a small thing but nonetheless welcome.

The most significant remaining fundamental problem is the capitalization and parsing of proper nouns.

And numbers are still wrong, with everything being written separately. For example, “七千九百四十三萬五千六百五十八” should be rendered as “qīqiān jiǔbǎi sìshísān wàn wǔqiān liùbǎi wǔshíbā.” But Google is still giving this as “qī qiān jiǔ bǎi sì shí sān wàn wǔ qiān liù bǎi wǔ shí bā.”

On the other hand, Google is starting to deal with “le”, with it being appended to verbs. This is a relatively tricky thing to get right, so I’m not surprised Google doesn’t have the details down yet.

So there’s still a lot of work to be done. But at least progress is being made in areas of fundamental importance. I’m heartened by the progress.

Related posts:

The current state:
screen shot of what Google Translate's Pinyin converter produces as of late September 2011

Leave a Reply

Your email address will not be published. Required fields are marked *