Google Translate’s Pinyin converter: now with apostrophes

Google has taken another major step toward making Google Translate‘s Pinyin converter decent. Finally, apostrophes.

Not long ago “??????????????” would have yielded “??rb?níy? ránér rénài lián?u p??r chá.” But now Google produces the correct “?’?rb?níy? rán’ér rén’ài lián’?u p?’?r chá.” (Well, one could debate whether that last one should be p?’?r chá, p?’?rchá, P?’?r chá, P?’?r Chá, or P?’?rchá. But the apostrophe is undoubtedly correct regardless.)

Also, the -men suffix is now solid with words (e.g., ??? –> péngyoumen and ??? –> háizimen). This is a small thing but nonetheless welcome.

The most significant remaining fundamental problem is the capitalization and parsing of proper nouns.

And numbers are still wrong, with everything being written separately. For example, “???????????????” should be rendered as “q?qi?n ji?b?i sìshís?n wàn w?qi?n liùb?i w?shíb?.” But Google is still giving this as “q? qi?n ji? b?i sì shí s?n wàn w? qi?n liù b?i w? shí b?.”

On the other hand, Google is starting to deal with “le”, with it being appended to verbs. This is a relatively tricky thing to get right, so I’m not surprised Google doesn’t have the details down yet.

So there’s still a lot of work to be done. But at least progress is being made in areas of fundamental importance. I’m heartened by the progress.

Related posts:

The current state:
screen shot of what Google Translate's Pinyin converter produces as of late September 2011

Taiwanese romanization used for Hanzi input method

Since I just posted about the new Hakka-based Chinese character input method I would be amiss not to note as well the release early this year of a different Chinese character input method based on Taiwanese romanization.

This one is available in Windows, Mac, and Linux flavors.

See the FAQ and documents below for more information (Mandarin only).

Táiw?n M?nnány? Hànzì sh?rùf? 2.0 b?n xiàzài (?????????? 2.0???) [Readers may wish to note the use of Minnan, which is generally preferred among unificationists and some advocates of Hakka and the languages of Taiwan's tribes.]

source: Jiàoyùbù Táiw?n M?nnány? Hànzì sh?rùf? (?????????????); Ministry of Education, Taiwan; June 16, 2010(?) / February 14, 2011(?) [Perhaps the Windows and Linux versions came first, with the Mac version following in 2011.]

Hakka romanization used for new Hanzi input method

Chinese character associated with Hakka morpheme ng?iTaiwan’s Ministry of Education has released software for Windows and Linux systems that uses Hakka romanization for the inputting of Chinese characters.

This appears to be aimed mainly at those who wish to input Hanzi used primarily in writing Hakka, such as that shown here.

See also Taiwanese romanization used for Hanzi input method.

sources:

Key Chinese updated, adding new Pinyin features

The program Key, which offers probably the best support for Hanyu Pinyin of any software and thus deserves praise for this alone, has just come out with an update with even more Pinyin features: Key 5.2 (build: August 21, 2011 — earlier builds of 5.2 do not offer all the latest features).

Those of you who already have the program should get the update, as it’s free. But note that if you update from the site, the installer will ask you to uninstall your current version prior to putting in the update, so make sure you have your validation code handy or you’ll end up with no version at all.

(If you don’t already have Key, I recommend that you try it out. A 30-day free trial version can be downloaded from the site.)

Anyway, here’s some of what the latest version offers:

  • Hanzi-with-Pinyin horizontal layout gets preserved when copied into MS Word documents (RT setting), as well as in .html and .pdf files created from such documents.
  • Pinyin Proofing (PP) assistance: with pinyin text displayed, pressing the PP button on the toolbar will colour the background of ambiguous pinyin passages blue; right-clicking on such a blue-background pinyin passage will display the available options.
  • Copy Special: a highlighted Chinese character passage can be copied & pasted automatically in various permutations.
  • Improved number-measureword system: it now works with Chinese-character, pinyin and Arabic numerals.
  • Showing different tones through coloured characters (Language menu under Preferences).
  • Chengyu (fixed four character expression) spacing logic: automatic spacing according to the pinyin standard (Language menu under Preferences).
  • Option to show tone sandhi on grey background (Language menu under Preferences).
  • Full support of standard pinyin orthography in capitalization and spacing.
  • Automatic glossary building.

Some programs, such as Popup Chinese’s “Chinese converter,” will take Chinese characters and then produce pinyin-annotated versions, with the Pinyin appearing on mouseover. Key, however, offers something extra: the ability to produce Hanzi-annotated orthographically correct Pinyin texts (i.e,, the reverse of the above). If you have a text in Key in Chinese characters, all you have to do is go to File --> Export to get Key to save your text in HTML format.

Here’s a sample of what this looks like.

B?n bi?ozh?n gu?dìngle yòng? Zh?ngwén p?ny?n f?ng’àn? p?nxi? xiàndài Hàny? de gu?zé? Nèiróng b?okuò f?ncí liánxi? f?? chéngy? p?nxi?f?? wàiláicí p?nxi?f?? rénmíng dìmíng p?nxi?f?? bi?odiào f?? yíháng gu?zé d?ng?

Basically, this is a “digraphia export” feature — terrific!

If you want something like the above, you do not have to convert the Hanzi to orthographically correct Pinyin first; Key will do it for you automatically. (I hope, though, that they’ll fix those double-width punctuation marks one of these days.)

Let’s say, though, that you want a document with properly word-parsed interlinear Hanzi and Pinyin. Key will do this too. To do this, a input a Hanzi text in Key, then highlight the text (CTRL + A) and choose Format --> Hanzi with Pinyin / Kanji-Kana with Romaji.

In the window that pops up, choose Hanzi with Pinyin / Kanji-kana with Romaji / Hangul with Romanization from the Two-Line Mode section and Show all non-Hanzi symbols in Pinyin line from Options. The results will look something like this:

GIF of a screenshot from Key, showing an interlinear text with word-parsed Pinyin above Chinese characters. This is an image of the text after being pasted into Microsoft Word.

This can be extremely useful for those authoring teaching materials.

Furthermore, such interlinear texts can be copied and pasted into Word. For the interlinear-formatted copy-and-paste into Word to work properly, Key must be set to rich text format, so before selecting the text you wish to use click on the button labeled RT. (Note yellow-highlighted area in the image below.)

screenshot identifying the location of the button that needs to be pressed to make the text RTF

How to handle ‘de’ and interjections in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????) deals with how to write Mandarin’s various de‘s, mood particles, and interjections.

This reading is available in two versions:

  • simplified Chinese characters: ???? ????? (zhùcí, tàncí)
  • traditional Chinese characters: ???? ?????

I’ve already written about the principles in previous posts. For example, see

How to write numbers and measure words in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????) is about writing numbers and measure words.

This reading is available in two versions:

For more on this, see these posts and the PDFs linked to therein.

How to write verbs in Hanyu Pinyin (Mandarin text)

cover image for the book

Here’s the first of several selected readings from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????). It covers the writing of verbs.

This reading is available in two versions:

  • simplified Chinese characters: ???? ??
  • traditional Chinese characters: ???? ??

For those who would like to read about this in English, see

important book on Pinyin to be excerpted on this site

cover image for the bookX?nhuá P?nxi? Cídi?n (???????? / ????????), is the second of Yin Binyong’s two books on Pinyin orthography. The first, Chinese Romanization: Pronunciation and Orthography, is in English and Mandarin; much of it is already available here on Pinyin.Info.

Although Xinhua Pinxie Cidian is only in Mandarin, the large number of examples makes it easy to get the point even if you may not read Mandarin in Chinese characters very well.

This week I will begin posting some excerpts from this invaluable work. What’s more, I have made a version in traditional Chinese characters, which I hope that readers in Taiwan, Hong Kong, and elsewhere will take advantage of. So those not used to reading simplified Chinese characters will have a choice (which is more than the government of Taiwan is providing these days).

I’m extremely happy to be able to bring you this information and with to acknowledge the generosity of the Commercial Press. Stay tuned.