Google Translate’s Pinyin converter revisited

When Google Translate‘s Pinyin converter was first released about a year and a half ago, it sucked. Wow, did it ever suck. Since then, however, Google has instituted some changes. So it seems about time this was reexamined.

Fortunately, Google’s Pinyin converter is now much better than before.

Here’s the sort of FUBAR romanization — it certainly doesn’t deserve to be called Hanyu Pinyin — Google used to produce:

tán zh?ng guó de“y?“hé” wén” de wèn tí? w? jué de zuì h?o néng xi?n li?o jiè y? xià zài zh?ng guó t?ng yòng de y? yán?… rú gu? n? sh? yòng zh?ng guó de gòng tóng y? yán p? t?ng huà? n? li?o ji? zhè ge y? yán de y? f??b? rú“de? de? de“ hé“le” de bù tóng yòng f?? ma?zh? dào zhè ge y? yán de j? b?n y?n jié?bù b?o kuò sh?ng diào? zh? y?u408gè ma?

Now the same passage will look like this:

Tán zh?ngguó de “y?” hé “wén” de wèntí, w? juéde zuì h?o néng xi?n li?o jiè y?xià zài zh?ngguó t?ngyòng de y?yán…. Rúgu? n? sh?yòng zh?ngguó de gòngtóng y?yán p?t?nghuà, n? li?oji? zhège y?yán de y?f? (b?rú “de, de, de “hé “le” de bùtóng yòngf?) ma? Zh?dào zhège y?yán de j?b?n y?njié (bù b?okuò sh?ngdiào) zh?y?u 408 gè ma?

At last! Capitalization at the beginning of a sentence and word parsing! But — you knew there was going to be a but, didn’t you? — Google’s Pinyin converter falls significantly short because it still fails completely in two fundamental areas: capitalization of proper nouns and proper use of the apostrophe.

1. Proper Nouns

Google’s Pinyin converter fails to follow the basic point of capitalizing proper nouns. For example, here are some well-known place names. I have prefixed the names with “?” because Google automatically capitalizes the first word in a line; so to see how it handles capitalization of place names something other than the name must go first.

screenshot showing what happens if the following is entered into Google Translate: '???, ???, ???, ???'. That leads to the following in Google Translate: 'in Xi'an, in Chang [sic], in Chongqing, in Beijing'. But the romanization line reads 'Zai xian, Zai changan, Zai chongqing, Zai beijing'

Google Translate gets these right, other than the odd truncation of Chang’an. But the Pinyin converter (see the gray text at the bottom of the image above) fails to capitalize these, even though it correctly parses them as units and thus must “know” their meanings.

The same thing happens with personal names.

Input this:

????
????
????

Google Translate provides this:

Is Ma Ying-jeou
Mao Zedong
Chen Shui-bian

Those are correct, if the missing Iss are discounted.

But the Pinyin appears as “Shì m?y?ngji? Shì máozéd?ng Shì chénshu?bi?n“. So even though the software understands that these names are units, the capitalization and word parsing are still wrong and they are still not rendered as they should be in Pinyin: “M? Y?ngji?,” “Máo Zéd?ng,” “Chén Shu?bi?n.

There is nothing obscure about capitalizing proper nouns. How did this get missed?

2. Apostrophes

The cases of Xi’an and Chang’an above already demonstrate apostrophe omission. Let’s try a few more tests, including some words that are not proper nouns.

Input this:

?????
??
??
??

The Pinyin is rendered as “??rb?níy? Ránér Rénài Lián?u” rather than the correct forms of ?’?rb?níy?, rán’ér, rén’ài, and lián’?u.

As always I want to stress that, whatever you might have heard elsewhere, apostrophes are not optional. But the rules for their use are easy — so easy that I suspect a fairly simple computer script could fix this problem quickly and simply. (Only about 2 percent of Mandarin words, as written in Hanyu Pinyin, have apostrophes.)

As is the case with the mistakes with proper nouns, these apostrophe errors are all the more puzzling because Google Translate does not appear to share them. Fortunately, these problems should not be particularly difficult to fix, especially if the Pinyin converter can make better use of Google Translate’s database.

Although Google’s failures to implement capitalization of proper nouns and apostrophe use are significant problems, they could likely be corrected quickly and easily. (I strongly suspect this would take considerably less time than it has taken for me to write this post.) The result would be a vastly improved converter. So I am hopeful that Google will work on this soon.

3. Additional work

Once Google gets those basics fixed, it should focus on the simple matter of correcting spacing before and after some quotations (which would surely take just a few minutes to take care of) and any other such spacing errors, and fixing its word parsing related to numbers (which is a bit more complicated, though the basics are easy: everything from 1 to 100 is written solid).

Next would come something requiring a bit more care: the proper handling of Mandarin’s three tense-marking particles: zhe, guo, and le.

And Google should attach the pluralizing suffix -men to the word it modifies rather than leaving it separate (e.g., háizimen, not háizi men).

Then, with all of those taken care of, Google would have a pretty good Pinyin converter that I would be happy to praise. Of course even then it could still use other improvements; but those would most likely deal more with particulars than the fundamentals of how Pinyin is meant to be written.

A separate post, to be written soon, will compare the performance of several Pinyin converters (including Google’s). Stay tuned.

Wenlin releases major upgrade (4.0)

Wenlin logoOne of my favorite programs, Wenlin (which bills itself as “software for learning Chinese”), has just released a major upgrade for both Mac and Windows versions. This doesn’t happen often; it has been three-and-a-half years since the most recent big change was issued (Wenlin 3.4) and heaven only knows how long since 3.0 came out. So, yes, this release has many substantial improvements.

One of the features nearest and dearest to my heart is that Wenlin 4.0 features greatly improved handling of Pinyin. I was among the field testers for the new version, so I’ve already spent a lot of time examining this feature. Here are a few important aspects of this:

  • Conversions from Chinese characters follow Hanyu Pinyin orthography much more closely than before. This is a major change for the better. (There’s still some room for improvement. But I don’t think we’ll have to wait years for this.)
  • In the past, using Wenlin to convert long texts in Chinese characters into Pinyin could be a real chore, with users having to examine example after example of Chinese characters with multiple pronunciations in order to select the proper pronunciation for that particular context. But now users may, if they so desire, tell Wenlin not to ask users for disambiguation input. Of course, that doesn’t mean that Wenlin will always guess right; but many users will be happy that this trade-off allows them to skip the frustration of, for example, having to tell the program over and over and over that, yes, in this case ? is pronounced shu? rather than shuì.
  • Relative newcomers to Mandarin may appreciate that for common words tone sandhi is indicated in Wenlin with additional marks (a dot or line below the vowel). This feature can also be turned off, for those who want standard Pinyin.

There are, of course, many improvements beyond the area of Pinyin. Here are a few:

  • One limitation of Wenlin 3.x was that its English dictionary wasn’t very large. But Wenlin 4.0 includes not only the ABC Chinese-English Comprehensive Dictionary but also the excellent new ABC English-Chinese, Chinese-English Dictionary (now finally in stock in the printed version).
  • The flashcards are now set up to handle not just individual characters but polysyllabic words.
  • There’s full Unicode Unihan 6.0 support for more than 75,000 Chinese characters.
  • And for those who think 75,000 just isn’t enough, users can now access Wenlin’s CDL technology. Through this, users can create new, variant, and rare characters; moreover, these can be published and shared with other Wenlin users or CDL-friendly devices.
  • Seal script versions of more than 11,000 characters are provided.
  • Wenlin contains an e-edition of the Shuowen Jiezi (Shu?wén Ji?zì / ???? / ????).
  • Coders will be interested to know that Wenlin appears to be headed toward becoming open-source.
  • Both Mandarin and English entries are marked with grade levels, which aids learners by indicating relative frequency of use. The levels for Mandarin words are based on the Hanyu Shuiping Kaoshi (Hàny? Sh?ipíng K?oshì / ?????? / ?????? / HSK).

The full version (i.e., the CD with the program comes in a box and is likely packaged with a hard copy of the manual) is US$199, or US$179 if you download it from the Wenlin Web store. Upgrades from 3.x cost US$49.

For more information, see the summary of features and outline of what’s new in Wenlin 4.0.

screenshot from Wenlin 4.0 -- click for larger version

sg domain names in Chinese characters lag

Between November, 23, 2009, when Singapore first began registering .sg names in Chinese characters, and June 10, 2010, when registrations of Chinese-character .sg domain names opened to all without any additional fee, only 1,024 such names were registered, or just 0.88 percent of all .sg domain names. This apparently includes not just second-level domains (e.g., ??.sg) but also third-level domains (e.g., ??.com.sg).

The percentage will likely rise in the coming months, as the process has only recently opened to everyone on a first-come, first-served basis. But, still, demand for such names in Singapore has so far been underwhelming.

A bit more information:

Registrations were accepted in phases, with registrations for government organizations starting on Nov. 23, 2009. Beginning in January, SGNIC began accepting domain name registrations from trademark holders.

During the third phase, the general public was allowed to register domain names starting on March 25, but applicants were charged a “priority fee” of S$100 (US$72) for each domain name, with domain names sought by several applicants awarded to the highest bidder.

In all three phases, applicants could apply for a domain name made up of Chinese numbers or a name with just one Chinese character for a fee of S$500 [US$360]….

The fourth and final phase began on June 10, with SGNIC accepting domain name applications on a first-come, first-served basis. The S$100 priority fee is no longer required, but applicants are no longer allowed to register domain names using Chinese numbers or names with just one Chinese character….

When IDA announced the introduction of Chinese-language domain names last year, SGNIC said the effort was partly intended to help Singaporean businesses target the Chinese market.

source: Singapore registers 1,000 Chinese-language domain names, IDG News Service, June 23, 2010

Baidu adds handwriting input

Baidu has just added a function that allows people to use their mouse to write Chinese characters for searches.

On the Baidu home page, click on “??” (sh?uxi?/??/handwrite).

This will bring up a pop-up box in which you can use your mouse to write Hanzi. This functions in basically the same way as the mouse-writing tool that Nciku added about two years ago.

source: Baidu.com’s Search Box Now Supports Chinese Handwriting Input, China Tech News, June 16, 2010

Le Grand Ricci now available on DVD

cover of le Grand Ricci numeriqueThe magnificent Grand dictionnaire Ricci de la langue chinoise, better known as le Grand Ricci, has just been released on DVD, almost a decade after its release in book form and exactly four hundred years after the death of Matteo Ricci.

The list price is 120 euros (about US$150), which is much cheaper than the printed edition. A long video in French (16:31) discusses the work. For those who would prefer something in English, a PDF gives background information on the dictionary project.

For a sample of the dictionary’s format and entries, see the 25 pages of entries for shan. Alas, as this example shows, the entries are not word parsed. But at least Hanyu Pinyin is now available for those who prefer it to Wade-Giles.

As long as I’m mentioning Ricci-related work, I might as well use the occasion to note that the Taipei Ricci Institute is putting its collection of books on permanent loan to Taiwan’s National Central Library.

Also, I’d like to note that parts of Matteo Ricci’s original dictionary can now be viewed through the Google Books scan of a publication from earlier this century of his Dicionário Português-Chinês.

Enjoy.

image from a manuscript page of Ricci's original dictionary

How to strip subtitle files down to text

Subtitle files are wonderful things. But for those times when you want to just read the text by itself and not bother with the movie (for example, if you want to prepare a script), they can look a little cluttered — what with all of that extra timing information.

1
00:00:49,000 –> 00:00:51,500
Yo! Li ye lai la

2
00:00:52,200 –> 00:00:53,600
Li ye lai la

3
00:01:06,900 –> 00:01:08,400
Xiulian

The directions below for how to remove all of the extra numbers, etc., refer to Microsoft Word, since most people already have that tool.

To strip out everything except for the text of the subtitles, run the following wildcard search (CTRL+H –> More –> Use wildcards).

Find what:
^13[0-9:\,\-\> ]{1,}^13

Replace with:
^p

Replace all.

Note: You may need to run the above “replace all” twice. Also, unless you add an extra return at the top of the document you’ll need to clean up the first entry by hand.

The above search-and-replace will yield

Yo! Li ye lai la

Li ye lai la

Xiulian

If, however, you want to at least temporarily keep the basic timing information (such as to help you identify scene boundaries more quickly), you can do so as follows.

Find what (wildcards):
^13[0-9]{1,}^13([0-9\:]{1,})([0-9\:\-\> \,]{1,})^13

Replace with:
^p\1^p

Again, unless you add an extra return at the top of the document you’ll need to clean up the first entry by hand.

This will result in the document looking like this:

00:00:49
Yo! Li ye lai la

00:00:52
Li ye lai la

00:01:06
Xiulian

Once you’re through with the timing information, you can strip it out using the first search-and-replace above.

How to create Hanyu Pinyin subtitles

Since posting about the Pinyin subtitles for Crouching Tiger, Hidden Dragon and The Story of Stuff I have received several messages inquiring about how someone might make Pinyin subtitles themselves. So I might as well put the answer online.

Although at the present stage of software implementation subtitle conversion isn’t as simple as pushing a button, the process is not particularly difficult, assuming you have a good source text to work from. But this does require some time and the right tools.

The Right Tools

The most important tool is, of course, the one that performs the conversion to Hanyu Pinyin. And it’s crucial to keep in mind that not all Pinyin converters are created equal; in fact, the vast majority of so-called Pinyin converters are best avoided entirely. The world does not need any more texts in the hobbled, poorly written mess that many people erroneously think of as Hanyu Pinyin; but it very much needs texts in real Hanyu Pinyin. So don’t waste your time with a program that doesn’t do a good job of word parsing, etc.

At present the clear front-runner for converting Chinese characters to Hanyu Pinyin texts (real Hanyu Pinyin texts) with a minimum need for user assistance is Key Chinese (Windows and Mac). The demo version is fully functional for 30 days. Key’s considerably less expensive “Hanzi To Pinyin With Tones Conversion Utility” for MS Word texts (also with a 30-day demo) would probably also work well, though I haven’t tried it myself.

Wenlin (Windows and Mac) is another excellent program that can produce properly spelled and word-parsed Hanyu Pinyin. But it requires users to run some disambiguation themselves, which can take a lot of time when you’re talking about something with as much text as a screenplay. Nonetheless, Wenlin’s incorporation of John DeFrancis’s ABC Chinese-English Comprehensive Dictionary makes it a helpful reference when performing post-conversion checks. Also, especially if one does not have Key, Wenlin — even the function-limited but non-expiring demo version — is useful for handling some adjustments (such as removing tone marks or providing a workaround when dealing with programs that don’t handle Chinese characters well).

You’ll also need a Unicode-friendly text editor with good support of regular expressions (to allow wildcard searches). I like Em Editor, which is Windows based. But lots of other programs would work. One could even use MS Word if so inclined.

Finally, having subtitles in an additional language (usually but not necessarily English) is often desirable, not just for others who would use these subtitles but for yourself as you create the Pinyin subtitles. But often the subtitles one may find in Mandarin are not in synch with those in another language. Software can fix this problem. But I don’t have enough experience with this to recommend certain programs over others.

To sum up, the tools I recommend for creating Hanyu Pinyin subtitles are

  1. Key Chinese
  2. Wenlin
  3. EmEditor (or another Unicode-friendly text editor)
  4. a subtitle synchronizer

Actually, just the first one, Key, is sufficient to produce Pinyin subtitles. But in my experience using a combination of all four programs is preferable.

Now it’s time to get down to business.

The Main Steps

  1. acquire source-version subtitles
  2. synchronize subtitle files
  3. identify names of the movie’s characters (dramatis personae)
  4. perform initial conversion of subtitles in Chinese characters to Pinyin
  5. double check the results and perform necessary cleanup
  6. create additional version without tone marks
  7. share your work

1. Acquire subtitles for conversion and reference

At present the most useful site for finding Mandarin subtitles written in Chinese characters is probably Shooter. You may need to try searching for your desired title in both simplified and traditional characters. Also, be aware that movies — especially movies not filmed in Mandarin — often have different names in China, Taiwan, Hong Kong, etc.

You may find it useful to look for subtitles in other languages, too. Shooter can be useful for that, though you may have better luck finding English subtitles at Opensubtitles.org or similar English-language sites.

One can often find different subtitle files for the same movie, so you may wish to examine more than one for quality. Another thing that’s worth keeping in mind: Converting from traditional Chinese characters to simplified Chinese characters is less problematic than vice versa.

2. Synchronize subtitle files

Once you have the files, you should synchronize them with each other according to the directions for the particular program you are using.

If the program you’re using for this chokes on Chinese characters, though, you’ll need to take a couple extra steps. First, convert the Chinese characters to Unicode numerical character references using either Pinyin Info’s NCR conversion tool or Wenlin (full or demo version). The reason for this is that even synchronizers that screw up “???” should be able to handle the NCR equivalent: “李慕白”.

In Wenlin,
Edit –> Make transformed copy –> Encode &#; [decimal]

Take the NCR text and synchronize the files. After you get this taken care of, reconvert to Chinese characters.

In Wenlin,
Edit –> Make transformed copy –> Decode &#;

3. identify names of the movie’s characters

You must teach your software know which strings of Hanzi represent names. For example, it’s crucial for clarity that the character name “???” is written “L? Mùbái” rather than as “l? mù bái“. This part takes some time up front. But do not skip this step, because it is not only crucial but will save a lot of trouble in the long run.

Before doing this, however, people may want to refamiliarize themselves with Hanyu Pinyin’s rules for proper nouns (PDF). Note especially what is supposed to be capitalized and what isn’t.

The Mandarin version of Wikipedia is one resource that can be helpful in identifying the names of at least the main characters in the movie. But you’ll want to look for more names and forms than will be listed there. Keep in mind that characters aren’t always addressed by their full names. You need to look for other forms as well (e.g., in Crouching Tiger, Hidden Dragon Li Mubai is sometimes referred to as “Li Mubai” but other times as “Li ye” or simply as “Mubai”) and enter them.

English subtitles can be very useful for locating most proper nouns in the text. (Hooray for word parsing and capitalization of proper nouns!) The following search of an English subtitle file should help pinpoint the location of proper nouns.

find (with “Match Case” and “Use Regular Expressions” checked):
[^\.]\s[A-Z][a-z]

in MS Word, find (with “Use wildcards” checked):
[!\.] [A-Z][a-z]

Since you’ve already synchronized your subtitles, you’ll easily be able to find the corresponding point in the Mandarin subtitles by looking at the time the line appears.

As you gather the names, or after you compile the full list, add your findings to the Pinyin converter’s user dictionary. In Key, perform Language –> Add Record, then fill in the Hanzi and Pinyin fields.

4. Perform initial conversion to Pinyin

OK, I know you’re eager to run the conversion and see all of those Hanzi turn into lovely Hanyu Pinyin. But there’s one quick step you need to do first. If you’re using Key Chinese, the program won’t make use of all of those character names you just painstakingly added to the user dictionary unless you first run “linguistic reconstruction” on the subtitles you wish to convert:
Language –> Linguistic Reconstruction

Now you’re ready for the big step:
Language –> Convert to Pinyin

5. Double check the results and perform necessary cleanup

Unfortunately, most Pinyin converters — even the best — tend to be lazy about inserting spaces in some of the places they belong, such as around numeric and alphabetic strings. For example, “?3?22????????5?31??????” will generally convert to something that looks like this:
“zì3yuè22rì (X?ngq?y?) q? zhì5yuè31rì (X?ngq?y?)”.
But it should look like this:
“zì 3 yuè 22 rì (X?ngq?y?) q? zhì 5 yuè 31 rì (X?ngq?y?)”.

To fix this in your Pinyin text, run the following regular expression in EmEditor. Make sure “Match Case” is not checked.
find:
([a-z?á?à?é?è?í?ì?ó?ò?ú?ù????])([0-9]+)([a-z?á?à?é?è?í?ì?ó?ò?ú?ù????])

replace:
\1 \2 \3

If you do this in Word, you’ll need to use the following instead in your wildcard search.
find:
([A-Za-z?Á?À?É?È?Í?Ì?Ó?Ò?Ú?Ù?????á?à?é?è?í?ì?ó?ò?ú?ù????])([0-9]{1,})([A-Za-z?Á?À?É?È?Í?Ì?Ó?Ò?Ú?Ù?????á?à?é?è?í?ì?ó?ò?ú?ù????])

replace:
\1 \2 \3

The rest of cleanup work usually involves you simply reading through the text, looking for errors, perhaps while listening to the movie.

6. Create additional version without tone marks

If you have Key, this is very easy: Highlight the entire text, then
Format –> Strip Tone Marks.

And you’re done, though because Key keeps u-umlaut as such, if your television or other device doesn’t show the letter ü correctly you may wish to convert “ü” to “v”.

If you don’t have Key or access to another program that can do the same thing as easily, then use a combination of Wenlin (again, even the demo will do what you need) and a text editor. First, paste your Pinyin text into Wenlin. Then select all of the text and perform
Edit –> Make transformed copy… –> Replace tone marks with 1-4

Copy and paste the results into a new document in your text editor. Then run the following search-and-replace. Make certain the “Use Regular Expressions” or “Use Wildcards” box is checked.

find:
([A-Za-z])([1-4])

replace with:
\1
Then click “Replace All”.

What this looks like in EmEditor:
image showing the search-and-replace dialog box for the above

What this looks like in MS Word:

7. Share your work

It’s much better if people can concentrate on producing new material rather than having to redo things others have already taken care of. So if you make a good Hanyu Pinyin version of something, please let me know.

Google Translate’s new Pinyin function sucks

Google Translate has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it’s terrible, all things considered.

What Google has created is about at the same level as scripts hobbyists cobbled together the hard way about a decade ago from early versions of CE-DICT. Don’t get me wrong: I greatly admire what sites such as Ocrat achieved way back when. But for Google — with all of its data, talent, and money — to do essentially no better so many years later is nothing short of a disgrace.

To see Google Translate’s Pinyin function in action you must select “Chinese (Simplified)” or “Chinese (Traditional)” — not English — for the “Translate into” option. And then click on “Show romanization”.

For example, here’s what happens with the following text from an essay on simplified and traditional Chinese characters by Zhang Liqing:

????“?”?“?”?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????“?? ?? ?“ ?“?” ???????? ????????????????????408???

screenshot of Google Translate with the text above

Google Translate will produce this:
screenshot of Google Translate with the text above and how Google Translate puts this into Pinyin (see text below)

tán zh?ng guó de“y?“hé” wén” de wèn tí? w? jué de zuì h?o néng xi?n li?o jiè y? xià zài zh?ng guó t?ng yòng de y? yán?zh?ng guó de zh? yào y? yán y?u n? xi??wéi shèn me w? shu? zhè ge? ér bù shu? nà gè?y?n wèi huán jìng?y?n wèi bèi qi?ng pò?y?n wèi w? ài zhè ge y? yán?y?n wèi y?u bì yào?y?n wèi zhè ge y? yán h?n zhòng yào?y? xi?ng xi?ng shén me shì zh?ng guó rén de gòng tóng y? yán?yòng y? gè gòng tóng y? yán y?u bì yào ma?wèi shé me?bié de hàn y? de qù xiàng huì z?n me yàng?rú gu? n? sh? yòng zh?ng guó de gòng tóng y? yán p? t?ng huà? n? li?o ji? zhè ge y? yán de y? f??b? rú“de? de? de“ hé“le” de bù tóng yòng f?? ma?zh? dào zhè ge y? yán de j? b?n y?n jié?bù b?o kuò sh?ng diào? zh? y?u408gè ma?

Here’s what’s wrong:

  • This is all bro ken syl la bles instead of word parsing. (So it’s never even a question if they get the use of the apostrophe correct.)
  • Proper nouns are not capitalized (e.g., zh?ng guó vs. Zh?ngguó).
  • The first letter in each sentence is not capitalized.
  • Punctuation is not converted but remains in double-width Chinese style, which is wrong for Pinyin.
  • Spacing around most punctuation is also incorrect (e.g., although a space is added after a comma and a closing parenthesis, there’s no space after a period or a question mark. See also the spacing or lack thereof around quotation marks, numerals, etc.)
  • Because of lack of word parsing, some given pronunciations are wrong.

In my previous post I complained about Google Maps’ unfortunately botched switch to Hanyu Pinyin. I stated there that, unlike Google Maps, Google Translate would correctly produce “Chengdu” from “??” (which it does when “translate into” is set for English). But I see that the romanization bug feature of Google Translate also fails this simple test. It generates the incorrect “chéng d?u“.

All of this indicates that Google apparently is using a poor database and not only has no idea of how Pinyin is meant to be written but also lacks an understanding of even the basic rules of Pinyin.

If you should need to use a free Web-based Pinyin converter, avoid Google Translate. Instead use Adso (from the fine folk at Popup Chinese) or perhaps NCIKU or MDBG — all of which, despite their limitations (c’mon, guys, sentences begin with capital letters), are significantly better than what Google offers.

By the way, Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I’ll leave those to others to analyze.

For lagniappe, here’s a real Hanyu Pinyin version of the text above:

Tán Zh?ngguó de “y?” hé “wén” de wèntí, w? juéde zuìh?o néng xi?n li?oji? y?xià zài Zh?ngguó t?ngyòng de y?yán. Zh?ngguó de zh?yào y?yán y?u n?xi?? Wèishénme w? shu? zhège, ér bù shu? nàge? Y?nwei huánjìng? Y?nwei bèi qi?ngpò? Y?nwei w? ài zhège y?yán? Y?nwei y?u bìyào? Y?nwei zhè ge y?yán h?n zhòngyào? Y? xi?ngxiang shénme shì Zh?ngguórén de gòngtóng y?yán? Yòng y?ge gòngtóng y?yán y?u bìyào ma? Weishenme? Biéde Hàny? de qùxiàng huì z?nmeyàng? Rúgu? n? sh?yòng Zh?ngguó de gòng tóng y?yán P?tónghuà, n? li?oji? zhège y?yán de y?f? (b?rú “de” hé “le” de bùtóng y?ngf?) ma? Zh?dao zhège y?yán de j?b?n y?njié (bù bàokuò sh?ngdiào) zh? y?u 408 ge ma?