iOS app for writing Pinyin with tone marks

Those of you who, unlike me, own an iPhone, an iPad, or an iPod Touch may find the new Pinyin Typist Mac application of use.

Taffy of Tailingua had a look at this for me.

I’ve had a play with the Pinyin application and I’m generally quite positive about it. It’s clean, unfussy, and gets the job done. The automatic positioning looks to be flawless (i.e. typing zhuang1 gives you zhu?ng, not zh?ang)…. Overall though I like it, as it does what it set out to do without any showboating or unnecessary steps (excepting apostrophes).

Although I wish the apostrophe and hyphen were right there on the main screen instead of on a secondary one, the program allows people to do what they need to do: type Pinyin with tone marks.

It sells for US$3.99 US$2.99.

[Headline changed from “Mac app for writing Pinyin with tone marks”]

Pinyin’s never-used letter?

As most people reading this blog know, Mandarin has about 1,300 syllables (interjections and loan words complicate the count a little). If tones — a basic part of the language — are disregarded, the number of drops to 400 and something syllables.

Given 410 or so basic syllables and 4 tones — one of these days I need to write something more on the wrongful neglect of the so-called neutral tone — some people might expect there to be more like 1,640 syllables instead of about 1,300. The reason for the lower number is that not all syllables exist in all four tones. For example, quite clearly the official language of Zhōngguó does not lack zhōng … or zhǒng or zhòng. But zhóng is another matter.

So not all possible tonal variations of those 400-something syllables appear in modern standard Mandarin. But what about letters?

If you look at the official alphabet for Hanyu Pinyin, it’s exactly the same as that for English (other than in pronunciation, of course), which is a bit odd, especially considering that Pinyin doesn’t use the letter v (or at least isn’t supposed to for Mandarin words).

So in this case, I’m excluding v but otherwise being expansionist about the glyphs I’m calling letters. To be specific: I’m referring to a-z, minus v, but including ā, á, ǎ, à, ē, é, ě, è, ī, í, ǐ, ì, ō, ó, ǒ, ò, ū, ú, ǔ, ù, ü, ǖ, ǘ, ǚ, and ǜ. (Even though Ī, Í, Ǐ, Ì, Ū, Ú, Ǔ, Ù, Ü, Ǖ, Ǘ, Ǚ, and Ǜ never come at the beginning of a word, let’s not automatically eliminate them, because there is an occasional need for ALL CAPS.)

Are there any of those possible glyphs that don’t appear at all — at least as given in the large ABC Comprehensive Chinese-English Dictionary?

The answer, perhaps surprisingly, is yes.

Which letter is it?

a. ǖ b. ǘ c. ǚ d. ǜ

Have you made your choice?

It doesn’t take much thought to eliminate C as the answer. “Nǚ” (woman) is one of those first-couple-of-Mandarin-lessons vocabulary terms. And the word for green (lǜsè) is hardly obscure either. It might be harder to think of a word with the letter ǘ; but there are some. Donkey () is probably the most common. So the answer is A: ǖ.

It’s important to note that the lack of ǖ is in appearance only. The sound ǖ occurs in plenty of Mandarin words; it’s just that Pinyin’s simplified orthography calls for writing “u” instead where ǖ follows j, q, x, or y.

But even though I didn’t find an example of ǖ, I’d encourage font designers not to scratch it from their list of must-have glyphs for Pinyin faces, especially since teachers will no doubt want to continue giving tone-pattern drills based on four tones for all vowels, regardless. Also, someone with a searchable edition of the Hanyu Da Cidian or maybe the new Oxford online edition is probably about to use the comments to point me to some obscure entry there….

How to handle ‘de’ and interjections in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) deals with how to write Mandarin’s various de‘s, mood particles, and interjections.

This reading is available in two versions:

I’ve already written about the principles in previous posts. For example, see

How to write numbers and measure words in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) is about writing numbers and measure words.

This reading is available in two versions:

For more on this, see these posts and the PDFs linked to therein.

How to write verbs in Hanyu Pinyin (Mandarin text)

cover image for the book

Here’s the first of several selected readings from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》). It covers the writing of verbs.

This reading is available in two versions:

For those who would like to read about this in English, see

important book on Pinyin to be excerpted on this site

cover image for the bookXīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》), is the second of Yin Binyong’s two books on Pinyin orthography. The first, Chinese Romanization: Pronunciation and Orthography, is in English and Mandarin; much of it is already available here on Pinyin.Info.

Although Xinhua Pinxie Cidian is only in Mandarin, the large number of examples makes it easy to get the point even if you may not read Mandarin in Chinese characters very well.

This week I will begin posting some excerpts from this invaluable work. What’s more, I have made a version in traditional Chinese characters, which I hope that readers in Taiwan, Hong Kong, and elsewhere will take advantage of. So those not used to reading simplified Chinese characters will have a choice (which is more than the government of Taiwan is providing these days).

I’m extremely happy to be able to bring you this information and wish to acknowledge the generosity of the Commercial Press. Stay tuned.

Google Translate and romaji revisited

OK, Google has improved its Pinyin converter some, though it still fails in important areas. So that’s the present situation for Google and Mandarin.

How about for Google and Japanese?

Professor J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures generously agreed to reexamine Google’s performance in conversions to rōmaji (Japanese written in romanization).

Below is his latest evaluation.

For his initial analysis (in December 2009), see Google Translate and rōmaji.

I ran the test passage through Google Translate again. There’s some improvement, but it’s still pretty mediocre.

Original Google Translate
6日午後4時35分ごろ、東京都千代田区皇居外苑の都道(内堀通り)の二重橋前交差点で、中国からの観光客の40代の男性が乗用車にはねられ、全身を強く打って間もなく死亡した。車は歩道に乗り上げて歩いていた男性(69)もはね、男性は頭を強く打って意識不明の重体。丸の内署は、運転していた東京都港区白金3丁目、会社役員高橋延拓容疑者(24)を自動車運転過失傷害の疑いで現行犯逮捕し、容疑を同致死に切り替えて調べている。 6-Nichi gogo 4-ji 35-fun-goro, Tōkyō-to Chiyoda-ku Kōkyogaien no todō (uchibori-dōri) no Nijūbashi zen kōsaten de, Chūgoku kara no kankō kyaku no 40-dai no dansei ga jōyōsha ni hane rare, zenshin o tsuyoku Utte mamonaku shibō shita. Kuruma wa hodō ni noriagete aruite ita dansei (69) mo hane, dansei wa atama o tsuyoku utte ishiki fumei no jūtai. Marunouchi-sho wa, unten shite ita Tōkyō-to Minato-ku hakkin 3-chōme, kaisha yakuin Takahashi nobe Tsubuse yōgi-sha (24) o jidōsha unten kashitsu shōgai no utagai de genkō-han taiho shi, yōgi o dō chishi ni kirikaete shirabete iru.
 同署によると、死亡した男性は横断歩道を歩いて渡っていたところを直進してきた車にはねられた。車は左に急ハンドルを切り、車道と歩道の境に置かれた仮設のさくをはね上げ、歩道に乗り上げたという。さくは歩道でランニングをしていた男性(34)に当たり、男性は両足に軽いけが。 Dōsho ni yoru to, shibō shita dansei wa ōdan hodō o aruite watatte ita tokoro o chokushin shite kita kuruma ni hane rareta. Kuruma wa hidari ni kyū handoru o kiri, shadō to hodō no sakai ni oka reta kasetsu no saku o haneage, hodō ni noriageta toyuu. Saku wa hodō de ran’ningu o shite ita dansei (34) niatari, dansei wa ryōashi ni karui kega.
 同署は、死亡した男性の身元確認を進めるとともに、当時の交差点の信号の状況を調べている。 Dōsho wa, shibō shita dansei no mimoto kakunin o susumeru totomoni, tōji no kōsaten no shingō no jōkyō o shirabete iru.
 現場周辺は東京観光のスポットの一つだが、最近はジョギングを楽しむ人も増えている。 Genba shūhen wa Tōkyō kankō no supotto no hitotsudaga, saikin wa jogingu o tanoshimu hito mo fuete iru.

Notes:

  • The use of numerals dodges a plethora of errors, but “6-Nichi” is still wrong for Muika.
  • Lots of correct capitalizations have been added, but “uchibori” was missed and “Utte” capitalized by mistake.
  • Some false spaces or lack of spaces persist: “hane rare”, “oka reta”; “hitotsudaga” and “niatari” were correctly hitotsu da ga and ni atari in the original test.
  • Names still get butchered (“hakkin” for Shirogane, “nobe Tsubuse” for Nobuhiro.
  • The needless apostrophe in “ran’ningu” is still there.
  • Interestingly, “toyuu” is a new error: it should be to iu.
  • There’s evidence of some attempt to use hyphens, but why not in “kankō kyaku” or “Nijūbashi zen”?

So, to update: Google gets kudos for conscientiousness, but I stick by my original comments.

For more by Prof. Unger, see Pinyin.info’s recommended readings, which includes selections from The Fifth Generation Fallacy: Why Japan Is Betting Its Future on Artificial Intelligence, Literacy and Script Reform in Occupation Japan: Reading Between the Lines, and Ideogram: Chinese Characters and the Myth of Disembodied Meaning.

Google Translate’s Pinyin converter revisited

When Google Translate‘s Pinyin converter was first released about a year and a half ago, it sucked. Wow, did it ever suck. Since then, however, Google has instituted some changes. So it seems about time this was reexamined.

Fortunately, Google’s Pinyin converter is now much better than before.

Here’s the sort of FUBAR romanization — it certainly doesn’t deserve to be called Hanyu Pinyin — Google used to produce:

tán zhōng guó de“yǔ“hé” wén” de wèn tí, wǒ jué de zuì hǎo néng xiān liǎo jiè yī xià zài zhōng guó tōng yòng de yǔ yán。… rú guǒ nǐ shǐ yòng zhōng guó de gòng tóng yǔ yán pǔ tōng huà, nǐ liǎo jiě zhè ge yǔ yán de yǔ fǎ(bǐ rú“de, de, de“ hé“le” de bù tóng yòng fǎ) ma?zhī dào zhè ge yǔ yán de jī běn yīn jié(bù bāo kuò shēng diào) zhǐ yǒu408gè ma?

Now the same passage will look like this:

Tán zhōngguó de “yǔ” hé “wén” de wèntí, wǒ juéde zuì hǎo néng xiān liǎo jiè yīxià zài zhōngguó tōngyòng de yǔyán…. Rúguǒ nǐ shǐyòng zhōngguó de gòngtóng yǔyán pǔtōnghuà, nǐ liǎojiě zhège yǔyán de yǔfǎ (bǐrú “de, de, de “hé “le” de bùtóng yòngfǎ) ma? Zhīdào zhège yǔyán de jīběn yīnjié (bù bāokuò shēngdiào) zhǐyǒu 408 gè ma?

At last! Capitalization at the beginning of a sentence and word parsing! But — you knew there was going to be a but, didn’t you? — Google’s Pinyin converter falls significantly short because it still fails completely in two fundamental areas: capitalization of proper nouns and proper use of the apostrophe.

1. Proper Nouns

Google’s Pinyin converter fails to follow the basic point of capitalizing proper nouns. For example, here are some well-known place names. I have prefixed the names with “在” because Google automatically capitalizes the first word in a line; so to see how it handles capitalization of place names something other than the name must go first.

screenshot showing what happens if the following is entered into Google Translate: '在西安, 在长安, 在重庆, 在北京'. That leads to the following in Google Translate: 'in Xi'an, in Chang [sic], in Chongqing, in Beijing'. But the romanization line reads 'Zai xian, Zai changan, Zai chongqing, Zai beijing'

Google Translate gets these right, other than the odd truncation of Chang’an. But the Pinyin converter (see the gray text at the bottom of the image above) fails to capitalize these, even though it correctly parses them as units and thus must “know” their meanings.

The same thing happens with personal names.

Input this:

是馬英九
是毛泽东
是陳水扁

Google Translate provides this:

Is Ma Ying-jeou
Mao Zedong
Chen Shui-bian

Those are correct, if the missing Iss are discounted.

But the Pinyin appears as “Shì mǎyīngjiǔ Shì máozédōng Shì chénshuǐbiǎn“. So even though the software understands that these names are units, the capitalization and word parsing are still wrong and they are still not rendered as they should be in Pinyin: “Mǎ Yīngjiǔ,” “Máo Zédōng,” “Chén Shuǐbiǎn.

There is nothing obscure about capitalizing proper nouns. How did this get missed?

2. Apostrophes

The cases of Xi’an and Chang’an above already demonstrate apostrophe omission. Let’s try a few more tests, including some words that are not proper nouns.

Input this:

阿爾巴尼亞
然而
仁愛
蓮藕

The Pinyin is rendered as “Āěrbāníyǎ Ránér Rénài Liánǒu” rather than the correct forms of Ā’ěrbāníyǎ, rán’ér, rén’ài, and lián’ǒu.

As always I want to stress that, whatever you might have heard elsewhere, apostrophes are not optional. But the rules for their use are easy — so easy that I suspect a fairly simple computer script could fix this problem quickly and simply. (Only about 2 percent of Mandarin words, as written in Hanyu Pinyin, have apostrophes.)

As is the case with the mistakes with proper nouns, these apostrophe errors are all the more puzzling because Google Translate does not appear to share them. Fortunately, these problems should not be particularly difficult to fix, especially if the Pinyin converter can make better use of Google Translate’s database.

Although Google’s failures to implement capitalization of proper nouns and apostrophe use are significant problems, they could likely be corrected quickly and easily. (I strongly suspect this would take considerably less time than it has taken for me to write this post.) The result would be a vastly improved converter. So I am hopeful that Google will work on this soon.

3. Additional work

Once Google gets those basics fixed, it should focus on the simple matter of correcting spacing before and after some quotations (which would surely take just a few minutes to take care of) and any other such spacing errors, and fixing its word parsing related to numbers (which is a bit more complicated, though the basics are easy: everything from 1 to 100 is written solid).

Next would come something requiring a bit more care: the proper handling of Mandarin’s three tense-marking particles: zhe, guo, and le.

And Google should attach the pluralizing suffix -men to the word it modifies rather than leaving it separate (e.g., háizimen, not háizi men).

Then, with all of those taken care of, Google would have a pretty good Pinyin converter that I would be happy to praise. Of course even then it could still use other improvements; but those would most likely deal more with particulars than the fundamentals of how Pinyin is meant to be written.

A separate post, to be written soon, will compare the performance of several Pinyin converters (including Google’s). Stay tuned.