How to handle ‘de’ and interjections in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) deals with how to write Mandarin’s various de‘s, mood particles, and interjections.

This reading is available in two versions:

I’ve already written about the principles in previous posts. For example, see

How to write numbers and measure words in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) is about writing numbers and measure words.

This reading is available in two versions:

For more on this, see these posts and the PDFs linked to therein.

How to write verbs in Hanyu Pinyin (Mandarin text)

cover image for the book

Here’s the first of several selected readings from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》). It covers the writing of verbs.

This reading is available in two versions:

For those who would like to read about this in English, see

important book on Pinyin to be excerpted on this site

cover image for the bookXīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》), is the second of Yin Binyong’s two books on Pinyin orthography. The first, Chinese Romanization: Pronunciation and Orthography, is in English and Mandarin; much of it is already available here on Pinyin.Info.

Although Xinhua Pinxie Cidian is only in Mandarin, the large number of examples makes it easy to get the point even if you may not read Mandarin in Chinese characters very well.

This week I will begin posting some excerpts from this invaluable work. What’s more, I have made a version in traditional Chinese characters, which I hope that readers in Taiwan, Hong Kong, and elsewhere will take advantage of. So those not used to reading simplified Chinese characters will have a choice (which is more than the government of Taiwan is providing these days).

I’m extremely happy to be able to bring you this information and wish to acknowledge the generosity of the Commercial Press. Stay tuned.

Taoyuan International Airport to adopt new style for signs

Taoyuan International Airport (or “Taiwan Taoyuan International Airport” as it is called in Taiwan’s official Chinglish form) will be replacing its signage, adopting a new color scheme and typeface.

Currently, the signs in the airport have a black background and yellow or white letters.

The new signs will be modeled after those in the Hong Kong International Airport, with white letters on a blue background. But signs for facilities such as restrooms and restaurants will have white letters on a dark red background. (Perhaps like these?)

Taiwan will also duplicate Hong Kong’s choice of font face: Fang Song (fǎng-Sòngtǐ / 仿宋體). One of the reasons for this is that some Chinese characters — such as for yuán (園) and guó (國) — appear similar if viewed from a distance, according to the president of the Taoyuan International Airport Corp. “Passengers can clearly see the words on the [new] signs even if they view them from 30 meters away,” he added.

The new signs will start to go up in August, with the change scheduled to be complete by the end of 2012.

I’ve made some samples (which, by the way, contain both 園 and 國) in three typefaces to help illustrate the look of Fang Song. Sorry not to have the right color scheme.

DF Fang Song:
sample of the typeface in three weights, with the text of '台灣桃園國際機場'

DF Kai Sho:
sample of the typeface in three weights, with the text of '台灣桃園國際機場'

DF Ming:
sample of the typeface in three weights, with the text of '台灣桃園國際機場'

sources

stories:

font samples:

additional material:

By the way, the contrast between the traditional and simplified versions of the of fǎng-Sòngtǐ (仿宋體 / 仿宋体) is a good illustration that to the untrained eye the conversion from one system to another is not necessarily self apparent.

vs.

Simplified Chinese characters being purged from Taiwan government sites

Taiwan’s government Web sites have begun removing versions of their content in simplified Chinese characters at the instruction of President Ma Ying-jeou (Mǎ Yīngjiǔ).

This isn’t just a matter of, say, writing “臺灣” (Taiwan) instead of “台灣” (which, yes, the government here is encouraging). This is much bigger. Entire pages, entire Web sites even, written in simplified Chinese characters are being eliminated.

The Tourism Bureau, for example, removed the version of its site in simplified Chinese characters from the Web on Wednesday. This comes at a time that the government’s further lifting of restrictions against individual Chinese tourists is aimed at bringing in more travelers from China.

The Presidential Office’s spokesman quoted Ma as saying “To maintain our role as the pioneer in Chinese culture, all government bodies should use traditional Chinese in official documents and on their Web sites, so that people around the world can learn about the beauty of traditional characters.” (Is that what pioneers do? I’ll try to find the original Mandarin-language quote later if I get a chance.)

It’s one thing to urge businesses not to remove traditional Chinese characters and replace them with simplified Chinese characters (as the government did on Tuesday). It’s quite another to remove alternate versions in another script — one that a very sizable target audience would have an easier time with.

During the administration of President Chen Shui-bian the government began adding versions in simplified Chinese characters of the Mandarin texts of official Web sites. The Office of the President was one such site. Now the simplified version is gone. That’s happening across government sites.

Here, for example, are some screen shots I took.

This was the language/script selection at the National Palace Museum‘s Web site as of Thursday morning. (Click to see an image of the entire front page.)
click to see image of entire front page
“简体中文” (jiǎntǐ Zhōngwén) is brighter because I had my mouse over it to highlight that text.

And here the language/script selection at the National Palace Museum’s Web site as of Thursday evening:
click to see image of entire front page
As you can see, the choice of viewing the site in simplified Chinese characters has been removed.

Here at Pinyin.Info I often have material in Hanyu Pinyin. So I’m certainly not unsympathetic to the idea that sometimes the medium really is a major part of the message. But I doubt that President Ma’s tough-love approach in this area will accomplish anything useful for Taiwan or the survival of traditional Chinese characters; indeed, I believe it will be counter-productive.

To be more blunt about this, this seems like a really, really bad idea.

some sources:

Google Translate’s Pinyin converter revisited

When Google Translate‘s Pinyin converter was first released about a year and a half ago, it sucked. Wow, did it ever suck. Since then, however, Google has instituted some changes. So it seems about time this was reexamined.

Fortunately, Google’s Pinyin converter is now much better than before.

Here’s the sort of FUBAR romanization — it certainly doesn’t deserve to be called Hanyu Pinyin — Google used to produce:

tán zhōng guó de“yǔ“hé” wén” de wèn tí, wǒ jué de zuì hǎo néng xiān liǎo jiè yī xià zài zhōng guó tōng yòng de yǔ yán。… rú guǒ nǐ shǐ yòng zhōng guó de gòng tóng yǔ yán pǔ tōng huà, nǐ liǎo jiě zhè ge yǔ yán de yǔ fǎ(bǐ rú“de, de, de“ hé“le” de bù tóng yòng fǎ) ma?zhī dào zhè ge yǔ yán de jī běn yīn jié(bù bāo kuò shēng diào) zhǐ yǒu408gè ma?

Now the same passage will look like this:

Tán zhōngguó de “yǔ” hé “wén” de wèntí, wǒ juéde zuì hǎo néng xiān liǎo jiè yīxià zài zhōngguó tōngyòng de yǔyán…. Rúguǒ nǐ shǐyòng zhōngguó de gòngtóng yǔyán pǔtōnghuà, nǐ liǎojiě zhège yǔyán de yǔfǎ (bǐrú “de, de, de “hé “le” de bùtóng yòngfǎ) ma? Zhīdào zhège yǔyán de jīběn yīnjié (bù bāokuò shēngdiào) zhǐyǒu 408 gè ma?

At last! Capitalization at the beginning of a sentence and word parsing! But — you knew there was going to be a but, didn’t you? — Google’s Pinyin converter falls significantly short because it still fails completely in two fundamental areas: capitalization of proper nouns and proper use of the apostrophe.

1. Proper Nouns

Google’s Pinyin converter fails to follow the basic point of capitalizing proper nouns. For example, here are some well-known place names. I have prefixed the names with “在” because Google automatically capitalizes the first word in a line; so to see how it handles capitalization of place names something other than the name must go first.

screenshot showing what happens if the following is entered into Google Translate: '在西安, 在长安, 在重庆, 在北京'. That leads to the following in Google Translate: 'in Xi'an, in Chang [sic], in Chongqing, in Beijing'. But the romanization line reads 'Zai xian, Zai changan, Zai chongqing, Zai beijing'

Google Translate gets these right, other than the odd truncation of Chang’an. But the Pinyin converter (see the gray text at the bottom of the image above) fails to capitalize these, even though it correctly parses them as units and thus must “know” their meanings.

The same thing happens with personal names.

Input this:

是馬英九
是毛泽东
是陳水扁

Google Translate provides this:

Is Ma Ying-jeou
Mao Zedong
Chen Shui-bian

Those are correct, if the missing Iss are discounted.

But the Pinyin appears as “Shì mǎyīngjiǔ Shì máozédōng Shì chénshuǐbiǎn“. So even though the software understands that these names are units, the capitalization and word parsing are still wrong and they are still not rendered as they should be in Pinyin: “Mǎ Yīngjiǔ,” “Máo Zédōng,” “Chén Shuǐbiǎn.

There is nothing obscure about capitalizing proper nouns. How did this get missed?

2. Apostrophes

The cases of Xi’an and Chang’an above already demonstrate apostrophe omission. Let’s try a few more tests, including some words that are not proper nouns.

Input this:

阿爾巴尼亞
然而
仁愛
蓮藕

The Pinyin is rendered as “Āěrbāníyǎ Ránér Rénài Liánǒu” rather than the correct forms of Ā’ěrbāníyǎ, rán’ér, rén’ài, and lián’ǒu.

As always I want to stress that, whatever you might have heard elsewhere, apostrophes are not optional. But the rules for their use are easy — so easy that I suspect a fairly simple computer script could fix this problem quickly and simply. (Only about 2 percent of Mandarin words, as written in Hanyu Pinyin, have apostrophes.)

As is the case with the mistakes with proper nouns, these apostrophe errors are all the more puzzling because Google Translate does not appear to share them. Fortunately, these problems should not be particularly difficult to fix, especially if the Pinyin converter can make better use of Google Translate’s database.

Although Google’s failures to implement capitalization of proper nouns and apostrophe use are significant problems, they could likely be corrected quickly and easily. (I strongly suspect this would take considerably less time than it has taken for me to write this post.) The result would be a vastly improved converter. So I am hopeful that Google will work on this soon.

3. Additional work

Once Google gets those basics fixed, it should focus on the simple matter of correcting spacing before and after some quotations (which would surely take just a few minutes to take care of) and any other such spacing errors, and fixing its word parsing related to numbers (which is a bit more complicated, though the basics are easy: everything from 1 to 100 is written solid).

Next would come something requiring a bit more care: the proper handling of Mandarin’s three tense-marking particles: zhe, guo, and le.

And Google should attach the pluralizing suffix -men to the word it modifies rather than leaving it separate (e.g., háizimen, not háizi men).

Then, with all of those taken care of, Google would have a pretty good Pinyin converter that I would be happy to praise. Of course even then it could still use other improvements; but those would most likely deal more with particulars than the fundamentals of how Pinyin is meant to be written.

A separate post, to be written soon, will compare the performance of several Pinyin converters (including Google’s). Stay tuned.

Oxford Chinese Dictionary goes online

cover image of the Oxford Chinese DictionaryOxford University Press has just announced that its massive Oxford Chinese Dictionary is now available through its Oxford Language Dictionaries Online subscription service.

I haven’t seen the online version yet myself; but from the publisher’s description it appears to be largely the same as the published edition, whose paucity of Pinyin is disappointing. The publisher, however, is promising that “Pinyin will be added to all Chinese translations” in November, which should be a major step forward.

Perhaps some of you at universities have institutional access. I would welcome reports.

source: What’s New, Oxford Language Dictionaries Online, May 2011.