Baidu adds handwriting input

Posted on Wednesday, June 16, 2010 by Pinyin Info

Baidu has just added a function that allows people to use their mouse to write Chinese characters for searches.

On the Baidu home page, click on “手写” (shǒuxiě/手寫/handwrite).

This will bring up a pop-up box in which you can use your mouse to write Hanzi. This functions in basically the same way as the mouse-writing tool that Nciku added about two years ago.

source: Baidu.com’s Search Box Now Supports Chinese Handwriting Input, China Tech News, June 16, 2010

Le Grand Ricci now available on DVD

Posted on Wednesday, May 19, 2010 by Pinyin Info

cover of le Grand Ricci numerique The magnificent Grand dictionnaire Ricci de la langue chinoise, better known as le Grand Ricci, has just been released on DVD, almost a decade after its release in book form and exactly four hundred years after the death of Matteo Ricci.

The list price is 120 euros (about US$150), which is much cheaper than the printed edition. A long video in French (16:31) discusses the work. For those who would prefer something in English, a PDF gives background information on the dictionary project.

For a sample of the dictionary’s format and entries, see the 25 pages of entries for shan. Alas, as this example shows, the entries are not word parsed. But at least Hanyu Pinyin is now available for those who prefer it to Wade-Giles.

As long as I’m mentioning Ricci-related work, I might as well use the occasion to note that the Taipei Ricci Institute is putting its collection of books on permanent loan to Taiwan’s National Central Library.

Also, I’d like to note that parts of Matteo Ricci’s original dictionary can now be viewed through the Google Books scan of a publication from earlier this century of his Dicionário Português-Chinês.

Enjoy.

image from a manuscript page of Ricci's original dictionary

OMG, it’s Hanzified English

Posted on Tuesday, April 13, 2010 by Pinyin Info

Taiwanese movie poster in Mandarin for 'Date Night', a.k.a. '約會喔麥尬' In Taiwan, the new movie Date Night has been given the Mandarin title Yuēhuì o mài gà (約會喔麥尬/约会喔麦尬).

Yuēhuì is simply the word for “date.” The interesting part is “o mài gà” (喔麥尬), which is a Mandarinized form of the English “oh my god.” (I wonder if this, being written in Hanzi despite still being basically English, would pass China’s new need for supposed purity.)

Most people here — especially those younger than about 40 — would simply write “oh my god” (or, less frequently, “o my god”) in English in the middle of an otherwise Mandarin text. (I’ll spare everyone the chart of Google searches; but it backs this up.) But brevity is standard in movie titles here, and “喔麥尬” is a lot more compact on a movie poster than “oh my god.” This, however, raises the question of why “喔麥尬” instead of the equally concise “OMG”. I don’t know the answer to that. But the path of lettered words in Mandarin is certainly not without twists and turns.

Like most other uses of Hanzified English, the results are not entirely faithful to the original sounds.

Mandarin’s ou would be a closer phonetic fit than o for the English “oh”.
There’s Ōu (區/区), a surname. But most of the time this Chinese character is pronounced qū (being one of those many Chinese characters with multiple pronunciations), so that certainly wouldn’t work well. There’s ǒu, which has a more clearly phonetic Hanzi (嘔/呕), but which has to do with vomit (ǒutù/嘔吐/呕吐). Another possible choice would be ōu (歐/欧); but that is associated mainly with Europe and doesn’t get used much as a phonetic component in non-Europe-related loan words outside the word for ohm: ōumǔ (歐姆/欧姆).

Mài (the Mandarin word for wheat), unlike most other Mandarin morphemes pronounced mai (various tones), gets used phonetically in lots of various loan words, such as Màidāngláo (McDonald’s/麥當勞/麦当劳), Màijiā (Mecca/麥加/麦加), Dānmài (Denmark/丹麥/丹麦), and Kāmàilóng (Cameroon/喀麥隆/喀麦隆). So its use is to be expected, though semantically there’s no link. And mài is certainly a better fit for the English my than it is for the Mc of McDonald’s, the Mec of Mecca, the mark of Denmark, or the me of Cameroon.

For ga there’s not a lot of choice. 咖 is often seen in the phonetic loan gālí (curry). The biggest problem here is that the same 咖 is also used as kā in a different, common phonetic loan: kāfēi (coffee). There’s 嘎; but, like 尬, it’s not exactly a well-known character.

Anyway, I could go on for a long time listing various possibilities. But the main point is that Chinese characters just don’t do well at this sort of thing.

As for Pinyin, I suppose the orthography could get interesting: o mài gà, o màigà, omài gà, or omàigà. But a Pinyin orthography would probably simply encourage people to write this in the original: oh my god.

BTW, you may wish to try the following experiment. The gà in o mài gà is most often seen in writing the word gāngà (尷尬/尴尬), which means awkward/embarrassed. Ask native speakers of Mandarin to write gāngà in Hanzi for you by hand without using a dictionary, a computer, or any other form of assistance. I bet that most people — even those with university degrees — won’t be able to write this common, ordinary word correctly.

And for lagniappe, the character 尬 is also sometimes seen in written Taiwanese as the equivalent of Mandarin’s jiā (加/add). I spotted an example of this just the other day on a cafe sign (in the sense of “buy something and ga something else for a special price”) but didn’t have a camera with me.

Combining Pinyin and Chinese character subtitles

Posted on Wednesday, April 7, 2010 by Pinyin Info

With any luck, this will be the last post for some time in my none too exciting but hopefully useful series on technical aspects of creating Pinyin subtitles.

Some people like to have Pinyin subtitles and Hanzi subtitles appear at the same time. Although I think that’s generally a bad idea (too much text to get through quickly that way, people would benefit from becoming accustomed to reading Pinyin texts as Pinyin texts, etc.), I’ll go ahead and offer instructions on how to make Pinyin subtitles appear above Chinese character subtitles.

These directions are for Microsoft Word, though other programs could be used instead.

Using Word, open copies of the two subtitle files you’d like to combine.

To get the alignment between the two files to match when they’re combined, it’s important that each subtitle entry is only one line long. You can check for possible instances of multi-line subtitles with a wildcard search (CTRL+H –> More –> Use wildcards).

Find what (with “Use wildcards” checked):
([!0-9])^13([!0-9^13])

If that search finds any multi-line subtitles, you’ll need to temporarily adjust those lines in both subtitle files, as follows:

Find what (with “Use wildcards” checked):
([!0-9])^13([!0-9^13])

Replace with:
\1|\2

Again, be sure to run that search-and-replace in both subtitle files. You’ll replace the “|” with a RETURN later.

Next, in the file with the Chinese characters (not the Pinyin file) strip out everything except for the text of the subtitles, leaving just the Hanzi text. (I wrote about this earlier in How to strip subtitle files down to text. The method is also useful for removing such information if you want to create the text of the screenplay.)

Find what (with “Use wildcards” checked):
^13[0-9:\,\-\> ]{1,}^13

Replace with:
^p

Note: You may need to run the above “replace all” twice for Word to catch everything.

You should have something that looks like this (with paragraph marks shown):

1¶
喲! 李爺來啦¶
¶
李爺來啦¶
¶
秀蓮¶
¶
秀蓮¶
¶
秀蓮，李慕白來啦¶

Now add extra lines, so the lines with Chinese characters will fit into the new document in the correct places.

Find what (with “Use wildcards” checked):
^13^13

Replace with:
^p^p^p^p^p

Delete the very first line — the one with the “1″ in it. Then add three blank lines above this.

You should have something that looks like this (with paragraph marks shown):

¶
¶
¶
喲! 李爺來啦¶
¶
¶
¶
¶
李爺來啦¶
¶
¶
¶
¶
秀蓮¶

Select all (CTRL+A). Then convert this to a table:
Table –> Convert –> Text to Table

Now switch to the Pinyin subtitles file.

First, add the extra lines blank lines into which you will later insert the Chinese characters that correspond with the Pinyin.

Find what (with “Use wildcards” checked):
^13^13

Replace with:
^p^p^p

Convert the Pinyin subtitles to a table:
CTRL+A
Table –> Convert –> Text to Table

Switch back to the Chinese character file. Copy the table there and paste it to the right of the table with the Pinyin text.

You should have something that looks like this:

1

00:00:49,000 –> 00:00:51,500

Yō! Lǐ yé lái la

喲! 李爺來啦

2

00:00:52,200 –> 00:00:53,600

Lǐ yé lái la

李爺來啦

3

00:01:06,900 –> 00:01:08,400

Xiùlián

秀蓮

4

00:01:09,000 –> 00:01:10,400

Xiùlián

秀蓮

Next, change this back into text:
Table –> Convert –> Table to Text

Remove the tabs:
Find what:
^t

Replace with:
[leave blank]

If you combined any lines earlier, break them apart now:
Find what:
|

Replace with:
^p

Your document should now look like this:

1
00:00:49,000 –> 00:00:51,500
Yō! Lǐ yé lái la
喲! 李爺來啦

2
00:00:52,200 –> 00:00:53,600
Lǐ yé lái la
李爺來啦

3
00:01:06,900 –> 00:01:08,400
Xiùlián
秀蓮

4
00:01:09,000 –> 00:01:10,400
Xiùlián
秀蓮

Save the file as plain text (*.txt), not as a Word document (*.doc). Then later rename this to give it the correct file extension (probably *.srt).

How to create Hanyu Pinyin subtitles

Posted on Wednesday, March 24, 2010 by Pinyin Info

Since posting about the Pinyin subtitles for Crouching Tiger, Hidden Dragon and The Story of Stuff I have received several messages inquiring about how someone might make Pinyin subtitles themselves. So I might as well put the answer online.

Although at the present stage of software implementation subtitle conversion isn’t as simple as pushing a button, the process is not particularly difficult, assuming you have a good source text to work from. But this does require some time and the right tools.

The Right Tools

The most important tool is, of course, the one that performs the conversion to Hanyu Pinyin. And it’s crucial to keep in mind that not all Pinyin converters are created equal; in fact, the vast majority of so-called Pinyin converters are best avoided entirely. The world does not need any more texts in the hobbled, poorly written mess that many people erroneously think of as Hanyu Pinyin; but it very much needs texts in real Hanyu Pinyin. So don’t waste your time with a program that doesn’t do a good job of word parsing, etc.

At present the clear front-runner for converting Chinese characters to Hanyu Pinyin texts (real Hanyu Pinyin texts) with a minimum need for user assistance is Key Chinese (Windows and Mac). The demo version is fully functional for 30 days. Key’s considerably less expensive “Hanzi To Pinyin With Tones Conversion Utility” for MS Word texts (also with a 30-day demo) would probably also work well, though I haven’t tried it myself.

Wenlin (Windows and Mac) is another excellent program that can produce properly spelled and word-parsed Hanyu Pinyin. But it requires users to run some disambiguation themselves, which can take a lot of time when you’re talking about something with as much text as a screenplay. Nonetheless, Wenlin’s incorporation of John DeFrancis’s ABC Chinese-English Comprehensive Dictionary makes it a helpful reference when performing post-conversion checks. Also, especially if one does not have Key, Wenlin — even the function-limited but non-expiring demo version — is useful for handling some adjustments (such as removing tone marks or providing a workaround when dealing with programs that don’t handle Chinese characters well).

You’ll also need a Unicode-friendly text editor with good support of regular expressions (to allow wildcard searches). I like Em Editor, which is Windows based. But lots of other programs would work. One could even use MS Word if so inclined.

Finally, having subtitles in an additional language (usually but not necessarily English) is often desirable, not just for others who would use these subtitles but for yourself as you create the Pinyin subtitles. But often the subtitles one may find in Mandarin are not in synch with those in another language. Software can fix this problem. But I don’t have enough experience with this to recommend certain programs over others.

To sum up, the tools I recommend for creating Hanyu Pinyin subtitles are

Key Chinese
Wenlin
EmEditor (or another Unicode-friendly text editor)
a subtitle synchronizer

Actually, just the first one, Key, is sufficient to produce Pinyin subtitles. But in my experience using a combination of all four programs is preferable.

Now it’s time to get down to business.

The Main Steps

acquire source-version subtitles
synchronize subtitle files
identify names of the movie’s characters (dramatis personae)
perform initial conversion of subtitles in Chinese characters to Pinyin
double check the results and perform necessary cleanup
create additional version without tone marks
share your work

1. Acquire subtitles for conversion and reference

At present the most useful site for finding Mandarin subtitles written in Chinese characters is probably Shooter. You may need to try searching for your desired title in both simplified and traditional characters. Also, be aware that movies — especially movies not filmed in Mandarin — often have different names in China, Taiwan, Hong Kong, etc.

You may find it useful to look for subtitles in other languages, too. Shooter can be useful for that, though you may have better luck finding English subtitles at Opensubtitles.org or similar English-language sites.

One can often find different subtitle files for the same movie, so you may wish to examine more than one for quality. Another thing that’s worth keeping in mind: Converting from traditional Chinese characters to simplified Chinese characters is less problematic than vice versa.

2. Synchronize subtitle files

Once you have the files, you should synchronize them with each other according to the directions for the particular program you are using.

If the program you’re using for this chokes on Chinese characters, though, you’ll need to take a couple extra steps. First, convert the Chinese characters to Unicode numerical character references using either Pinyin Info’s NCR conversion tool or Wenlin (full or demo version). The reason for this is that even synchronizers that screw up “李慕白” should be able to handle the NCR equivalent: “李慕白”.

In Wenlin,
Edit –> Make transformed copy –> Encode &#; [decimal]

Take the NCR text and synchronize the files. After you get this taken care of, reconvert to Chinese characters.

In Wenlin,
Edit –> Make transformed copy –> Decode &#;

3. identify names of the movie’s characters

You must teach your software know which strings of Hanzi represent names. For example, it’s crucial for clarity that the character name “李慕白” is written “Lǐ Mùbái” rather than as “lǐ mù bái“. This part takes some time up front. But do not skip this step, because it is not only crucial but will save a lot of trouble in the long run.

Before doing this, however, people may want to refamiliarize themselves with Hanyu Pinyin’s rules for proper nouns (PDF). Note especially what is supposed to be capitalized and what isn’t.

The Mandarin version of Wikipedia is one resource that can be helpful in identifying the names of at least the main characters in the movie. But you’ll want to look for more names and forms than will be listed there. Keep in mind that characters aren’t always addressed by their full names. You need to look for other forms as well (e.g., in Crouching Tiger, Hidden Dragon Li Mubai is sometimes referred to as “Li Mubai” but other times as “Li ye” or simply as “Mubai”) and enter them.

English subtitles can be very useful for locating most proper nouns in the text. (Hooray for word parsing and capitalization of proper nouns!) The following search of an English subtitle file should help pinpoint the location of proper nouns.

find (with “Match Case” and “Use Regular Expressions” checked):
[^\.]\s[A-Z][a-z]

in MS Word, find (with “Use wildcards” checked):
[!\.] [A-Z][a-z]

Since you’ve already synchronized your subtitles, you’ll easily be able to find the corresponding point in the Mandarin subtitles by looking at the time the line appears.

As you gather the names, or after you compile the full list, add your findings to the Pinyin converter’s user dictionary. In Key, perform Language –> Add Record, then fill in the Hanzi and Pinyin fields.

4. Perform initial conversion to Pinyin

OK, I know you’re eager to run the conversion and see all of those Hanzi turn into lovely Hanyu Pinyin. But there’s one quick step you need to do first. If you’re using Key Chinese, the program won’t make use of all of those character names you just painstakingly added to the user dictionary unless you first run “linguistic reconstruction” on the subtitles you wish to convert:
Language –> Linguistic Reconstruction

Now you’re ready for the big step:
Language –> Convert to Pinyin

5. Double check the results and perform necessary cleanup

Unfortunately, most Pinyin converters — even the best — tend to be lazy about inserting spaces in some of the places they belong, such as around numeric and alphabetic strings. For example, “自3月22日（星期一）起至5月31日（星期一）” will generally convert to something that looks like this:
“zì3yuè22rì (Xīngqīyī) qǐ zhì5yuè31rì (Xīngqīyī)”.
But it should look like this:
“zì 3 yuè 22 rì (Xīngqīyī) qǐ zhì 5 yuè 31 rì (Xīngqīyī)”.

To fix this in your Pinyin text, run the following regular expression in EmEditor. Make sure “Match Case” is not checked.
find:
([a-zāáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜ])([0-9]+)([a-zāáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜ])

replace:
\1 \2 \3

If you do this in Word, you’ll need to use the following instead in your wildcard search.
find:
([A-Za-zĀÁǍÀĒÉĚÈĪÍǏÌŌÓǑÒŪÚǓÙǕǗǙǛāáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜ])([0-9]{1,})([A-Za-zĀÁǍÀĒÉĚÈĪÍǏÌŌÓǑÒŪÚǓÙǕǗǙǛāáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜ])

replace:
\1 \2 \3

The rest of cleanup work usually involves you simply reading through the text, looking for errors, perhaps while listening to the movie.

6. Create additional version without tone marks

If you have Key, this is very easy: Highlight the entire text, then
Format –> Strip Tone Marks.

And you’re done, though because Key keeps u-umlaut as such, if your television or other device doesn’t show the letter ü correctly you may wish to convert “ü” to “v”.

If you don’t have Key or access to another program that can do the same thing as easily, then use a combination of Wenlin (again, even the demo will do what you need) and a text editor. First, paste your Pinyin text into Wenlin. Then select all of the text and perform
Edit –> Make transformed copy… –> Replace tone marks with 1-4

Copy and paste the results into a new document in your text editor. Then run the following search-and-replace. Make certain the “Use Regular Expressions” or “Use Wildcards” box is checked.

find:
([A-Za-z])([1-4])

replace with:
\1
Then click “Replace All”.

What this looks like in EmEditor:
image showing the search-and-replace dialog box for the above

What this looks like in MS Word:

7. Share your work

It’s much better if people can concentrate on producing new material rather than having to redo things others have already taken care of. So if you make a good Hanyu Pinyin version of something, please let me know.

How to learn real Mandarin: an anecdote

Posted on Thursday, January 28, 2010 by Pinyin Info

The following is a guest post by Professor Victor H. Mair of the University of Pennsylvania’s Department of East Asian Languages and Civilizations.

The personal names used in the original correspondence have been changed to generational designations.

Compared to the Hànzì-centric pedagogical approach which forces little children to memorize extremely difficult and complicated characters like 老鼠 and 蝴蝶 instead of teaching them lǎoshǔ and húdié, today I received some more hopeful and sane news.

A friend of mine is teaching her grandson Mandarin. The way she is doing it is to write out the Xī yóu jì (Journey to the West) in a simple báihuà paraphrase using Pinyin only (with glosses in English for new vocabulary). My friend is a first-generation immigrant to America, and her daughter married a German who was studying in the United States, so that makes the grandson third-generation Chinese-American/German.

The other day, the grandson asked his mom out of the blue: “What’s the difference between shíjiān, shídài, and shífèn?” My friend, the grandmother, explained to me that all of these terms were in the Pinyin text that she had prepared for her grandson, and that she had glossed them as “time” or “period.” She said that the boy’s mother was very pleased, and she was tickled too, because the boy had discerned the common element shí by himself. As my friend (the grandmother) put it, “He spends very little time on Chinese, so we were pleasantly surprised.”

Hearing this account from my friend, I wrote to her: “Thank you so much for the TRULY WONDERFUL story you wrote about your grandson. This is how to learn real Chinese!!!! And you are being a real Chinese teacher to teach your grandson this way. And I’m also happy that your daughter appreciates what you and her son are doing together. Tell your grandson I’m really impressed at the intelligence of his question.”

A new look at early character forms

Posted on Wednesday, December 2, 2009 by Pinyin Info

Cover of the book: 'Orthography of Early Chinese Writing' A review in a recent journal issue focusing on romanization led me to discover online the entire text of an interesting new book: Orthography of Early Chinese Writing: Evidence from Newly Excavated Manuscripts, by Imre Galambos.

This gives an idea of what the book covers:

Beside offering a more useful approach to both studying Warring States manuscripts and variant character forms in general, this study sheds new light on the development of the Chinese script, its transition into the clerical script stage, and the reality of the Qin reforms. The variability of Warring States character forms demonstrates that Chinese characters evolved not along a linear path that stretched from the oracle-bone inscriptions to the modern script but followed a complex process involving distinct cultures and languages. The “fuzziness” of the line of evolution with respect to the spoken languages and dialects of ancient China raises questions regarding the national identity of the Chinese script. A related issue is how far can one go back in time and say with certainty that the various scripts were not only the predecessors of the Chinese script but were in fact Chinese.

Some numbers for searches:

ISBN 963 463 811 2
ISSN 1787-7482

Google Translate’s new Pinyin function sucks

Posted on Wednesday, November 18, 2009 by Pinyin Info

Google Translate has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it’s terrible, all things considered.

What Google has created is about at the same level as scripts hobbyists cobbled together the hard way about a decade ago from early versions of CE-DICT. Don’t get me wrong: I greatly admire what sites such as Ocrat achieved way back when. But for Google — with all of its data, talent, and money — to do essentially no better so many years later is nothing short of a disgrace.

To see Google Translate’s Pinyin function in action you must select “Chinese (Simplified)” or “Chinese (Traditional)” — not English — for the “Translate into” option. And then click on “Show romanization”.

For example, here’s what happens with the following text from an essay on simplified and traditional Chinese characters by Zhang Liqing:

談中國的“語”和“文”的問題，我覺得最好能先了解一下在中國通用的語言。中國的主要語言有哪些？為甚麼我說這個，而不說那個？因為環境？因為被強迫？因為我愛這個語言？因為有必要？因為這個語言很重要？也想想什麼是中國人的共同語言。用一個共同語言有必要嗎？為什麼？別的漢語的去向會怎麼樣？如果你使用中國的共同語言普通話，你了解這個語言的語法（比如“的，得，地“ 和“了” 的不同用法）嗎？知道這個語言的基本音節（不包括聲調）只有408個嗎？

screenshot of Google Translate with the text above

Google Translate will produce this:
screenshot of Google Translate with the text above and how Google Translate puts this into Pinyin (see text below)

tán zhōng guó de“yǔ“hé” wén” de wèn tí， wǒ jué de zuì hǎo néng xiān liǎo jiè yī xià zài zhōng guó tōng yòng de yǔ yán。zhōng guó de zhǔ yào yǔ yán yǒu nǎ xiē？wéi shèn me wǒ shuō zhè ge， ér bù shuō nà gè？yīn wèi huán jìng？yīn wèi bèi qiǎng pò？yīn wèi wǒ ài zhè ge yǔ yán？yīn wèi yǒu bì yào？yīn wèi zhè ge yǔ yán hěn zhòng yào？yě xiǎng xiǎng shén me shì zhōng guó rén de gòng tóng yǔ yán。yòng yī gè gòng tóng yǔ yán yǒu bì yào ma？wèi shé me？bié de hàn yǔ de qù xiàng huì zěn me yàng？rú guǒ nǐ shǐ yòng zhōng guó de gòng tóng yǔ yán pǔ tōng huà， nǐ liǎo jiě zhè ge yǔ yán de yǔ fǎ（bǐ rú“de， de， de“ hé“le” de bù tóng yòng fǎ） ma？zhī dào zhè ge yǔ yán de jī běn yīn jié（bù bāo kuò shēng diào） zhǐ yǒu408gè ma？

Here’s what’s wrong:

This is all bro ken syl la bles instead of word parsing. (So it’s never even a question if they get the use of the apostrophe correct.)
Proper nouns are not capitalized (e.g., zhōng guó vs. Zhōngguó).
The first letter in each sentence is not capitalized.
Punctuation is not converted but remains in double-width Chinese style, which is wrong for Pinyin.
Spacing around most punctuation is also incorrect (e.g., although a space is added after a comma and a closing parenthesis, there’s no space after a period or a question mark. See also the spacing or lack thereof around quotation marks, numerals, etc.)
Because of lack of word parsing, some given pronunciations are wrong.

In my previous post I complained about Google Maps’ unfortunately botched switch to Hanyu Pinyin. I stated there that, unlike Google Maps, Google Translate would correctly produce “Chengdu” from “成都” (which it does when “translate into” is set for English). But I see that the romanization bug feature of Google Translate also fails this simple test. It generates the incorrect “chéng dōu”.

All of this indicates that Google apparently is using a poor database and not only has no idea of how Pinyin is meant to be written but also lacks an understanding of even the basic rules of Pinyin.

If you should need to use a free Web-based Pinyin converter, avoid Google Translate. Instead use Adso (from the fine folk at Popup Chinese) or perhaps NCIKU or MDBG — all of which, despite their limitations (c’mon, guys, sentences begin with capital letters), are significantly better than what Google offers.

By the way, Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I’ll leave those to others to analyze.

For lagniappe, here’s a real Hanyu Pinyin version of the text above:

Tán Zhōngguó de “yǔ” hé “wén” de wèntí, wǒ juéde zuìhǎo néng xiān liǎojiě yīxià zài Zhōngguó tōngyòng de yǔyán. Zhōngguó de zhǔyào yǔyán yǒu nǎxiē? Wèishénme wǒ shuō zhège, ér bù shuō nàge? Yīnwei huánjìng? Yīnwei bèi qiǎngpò? Yīnwei wǒ ài zhège yǔyán? Yīnwei yǒu bìyào? Yīnwei zhè ge yǔyán hěn zhòngyào? Yě xiǎngxiang shénme shì Zhōngguórén de gòngtóng yǔyán? Yòng yīge gòngtóng yǔyán yǒu bìyào ma? Weishenme? Biéde Hànyǔ de qùxiàng huì zěnmeyàng? Rúguǒ nǐ shǐyòng Zhōngguó de gòng tóng yǔyán Pǔtónghuà, nǐ liǎojiě zhège yǔyán de yǔfǎ (bǐrú “de” hé “le” de bùtóng yǒngfǎ) ma? Zhīdao zhège yǔyán de jīběn yīnjié (bù bàokuò shēngdiào) zhǐ yǒu 408 ge ma?

Pinyin News

news and discussions mainly related to Chinese characters and romanization

Category Archives: Chinese characters