How to write verbs in Hanyu Pinyin

cover of Chinese Romanization: Pronunciation and OrthographyToday’s release from Yin Binyong’s Chinese Romanization: Pronunciation and Orthography is a long, important section that covers verbs in Hanyu Pinyin (2 MB PDF).

In this post I’ll go over the rules for what to do with Mandarin’s three tense-marking particles — zhe (著/着), guo ( 過/过), and le (了) — since these participles are extremely common and people are often unaware of how they should be written in Pinyin. Fortunately, this is pretty easy: -zhe and -guo are always written solid (with no interposing space or hyphen) with the verb they follow. The case of le is more complicated (but not too much trouble).

-zhe 著/着

-zhe is added onto a verb to indicate the ongoing nature of an action or state, whether in the past, present, or future. It thus bears a certain similarity to the English verb suffix -ing. A sentence in which -zhe is used tends to emphasize the description of the action or state indicated by the verb. Since no other sentence component may be interposed between a verb and -zhe, a general rule may be stated: -zhe is always written as one unit with the verb it follows.

Some examples:

Tā wēixiàozhe duì wǒ shuō: “Nǐ lái ba!”
她微笑著對我說: “你來吧!”
(Smiling, she said to me, “Come on!”)

Nǐ xiān děngzhe, ràng wǒ jìnqu kànkan.
你先等著,讓我進去看看.
(You wait out here while I go in and look.)

Note that “kànkan” in the sentence above shows something else about verbs in Hanyu Pinyin: the second part of a reduplicated verb is in the neutral tone.

-guo 過/过

-guo is added after a verb to indicate that a given person or object has experienced the action expressed by the verb. -guo may only be used in the past tense. Since no other sentence component may be interposed between a verb and -guo, a general rule may be formulated: -guo is always written as one unit with the verb it follows.

Some examples:

Wǒ xuéguo liǎng nián Yīngyǔ, dànshì méi xuéguo Rìyǔ.
我學過兩年英語,但是沒學過日語.
(I’ve studied two years of English, but I haven’t studied Japanese.)

Nà běn shū wǒ kànjianguo, hǎoxiàng zài shūjià shang.
那本書我看見過,好像在書架上.
(I have seen that book somewhere; I think it’s on the bookshelf.)

le 了

The tense-marking particle le is added after a verb to emphasize that the action expressed has been completed or that the state indicated has been achieved. -le is ordinarily written as one unit with the verb it follows.

For example:

Zuótiān wǎnshang wǒ kànle yī chǎng diànyǐng.
昨天晚上我看了一場電影.
(I saw a movie yesterday evening.)

But here’s where it starts to get a little more complicated.

If a verb complement is interposed between the verb and the tense marker -le in a sentence, there are two possible written forms. If the verb and its complement are written as a unit, then –le is written as a unit with them; if they are written separately, then -le too is written separately.

For example:

Xiǎo Chén qīngqīng de guānshangle fángmén.
小陳輕輕的關上了房門.
(Xiao Chen gently closed the house door.)

But also:

Tā cóng shūbāo lǐ ná chūlai le liǎng běn liánhuánhuà.
他從書包裡拿出來了兩本連環畫.
(He pulled two comic books out of his bookbag.)
(ná 拿 — verb; chūlai 出來 — complement)

I suspect that’s the sort of thing that may well change (for the simpler) once Pinyin makes it out into the world of popular usage as a script in its own right. But for now I’m just givin’ the rules as I find ‘em.

Speaking of which, here’s the final twist on -le.

Apart from its function as a tense-marking particle, -le can also serve as a mood-marking particle. (The former usage is usually denominated le1 and the latter le2 in grammar texts.) In its latter capacity, le always appears at the end of a sentence or clause, just before a comma, period, or other punctuation mark. The two different le’s, le1 and le2, are sometimes quite difficult to distinguish in practice. With this in mind, and with the aim of simplifying HP orthography, the, following simple rule is set out: any le, whether le1 or le2, appearing at the end of a sentence or clause is to be written by itself.

Thus, that’s actually a good thing, since it simplified matters. So, for anyone programming a Pinyin converter, put a space before le if it is immediately followed by punctuation.

Thus, for example:

Wǒmen túshūguǎn yǐjing mǎile sānwàn duō běn shū le.
我們圖書館已經買了三萬多本書了.
(Our library has already purchased over thirty thousand books.)

Hǎo le, hǎo le, dàjiā dōu bié chǎo le.
好了好了, 大家都別吵了.
(All right, all right, everybody quiet down.)

Remember: This post covered only one small aspect of the entire reading. So be sure to download and read the entire PDF, which has many, many more examples.

It’s also a very useful reading for students of Mandarin.

new tools for writing Pinyin

I’ve received word from software writers of not one but two useful new tools for writing Hanyu Pinyin with tone marks (i.e., not using Pinyin to enter Chinese characters but really writing Hanyu Pinyin texts).

Pīnyīn Editor, by Bengt Moss-Petersen, is an online tool that currently works best with IE 6+ and Firefox.

click to visit the online Pinyin editor

(I made text much larger than the default size, since I had to reduce the image to make it fit in my blog. Users can choose among several sizes and fonts.)

And Pinyin Builder, by Wayne Kirk, is freeware for Windows systems.

click to visit the download page for Pinyin Builder

If you have an open Microsoft Office document, clicking Pinyin Builder’s “GO” button will insert your Pinyin text into that document. You don’t need to bother with copying and pasting.

In both of these, ü + tone mark is produced by v + tone number. Pinyin Builder also offers a combination using the CTRL key.

The tone number can be entered either immediately after the vowel or later in the syllable (e.g., zho1ng, zhong1, and zhon1g all yield “zhōng”). Pinyin Editor also offers the option to simply click on buttons with the vowels and tone marks.

I hope people make frequent use of both of these terrific new tools.

Related:

letters with diacritics: a roughly alphabetical chart

þ

For those who don’t know an ogonek from a retroflex hook — and sometimes for those who do — finding a needed letter with a diacritical mark can be a time-consuming process. (I look forward to the days when combining marks are much better supported.)

So I made a chart with lots of — but certainly not all — diacritics, sorted alphabetically by appearance as well as name and sound. That means, for example, that a thorn (þ) can be found under p as well as under t (as in th), even though — I know, I know — p and þ are unrelated.

Perhaps some people will find it quicker to use than going through the various Unicode charts or searching through various other charts in which the letters are grouped by sound rather than appearance. Someone has probably already made one of these, and done a better job. But I didn’t have any luck finding it before hacking out my own.

Here it is, for what it’s worth: letters with diacritical marks, grouped alphabetically.

I hope some people find it useful.

Taiwan personal names: a frequency list

Imagine taking everyone in the United States with the family name of Johnson, Williams, Jones, Brown, Davis, Miller, Wilson, Moore, Taylor, or Anderson … and giving them all the new family name of “Smith.” Then add to the Smiths everyone surnamed Thomas, Jackson, White, Harris, Martin, Thompson, Garcia, Martinez, Robinson, Clark, Rodriguez, Lewis, Lee, Walker, Hall, Allen, Young, Hernandez, King, Wright, and Lopez. Those are, in descending order beginning with Smith, the 32 most common family names in the United States. It takes all of those names together to reach the same frequency that the name “Chen” (Hoklo: Tân) has in Taiwan.

Chen covers 10.93 percent of the population here, according to figures released by Chih-Hao Tsai based on the recent release of the names of the 81,422 people who took Taiwan’s college entrance exam this year.

By way of additional contrast, Smith, the most common family name in the United States, covers just 1.00 percent of the population there.

In Taiwan, the 10 most common family names cover half (50.22 percent) of the population. Covering the same percentage in the United States requires the top 1,742 names there. And covering the same percentage as Taiwan’s top 25 names (74.17 percent) requires America’s top 13,425 surnames.

So if you’re just getting started in Mandarin, consider that you’ll get a lot of mileage out of memorizing the tones for the top ten names.

family name (Mandarin form) spelling usually seen in Taiwan percent of total cumulative percentage
Chén Chen 10.93% 10.93%
Lín Lin 8.36% 19.29%
Huáng Huang 6.06% 25.35%
Zhāng Chang 5.39% 30.74%
Li, Lee 5.20% 35.94%
Wáng Wang 4.20% 40.14%
Wu 4.03% 44.17%
Liú Liu 3.18% 47.36%
Cài Tsai 2.86% 50.22%
Yáng Yang 2.64% 52.86%
Hsu 2.32% 55.18%
Zhèng Cheng 1.86% 57.05%
Xiè Hsieh 1.77% 58.82%
Qiū Chiu 1.50% 60.32%
Guō Kuo 1.48% 61.79%
Zēng Tseng 1.45% 63.24%
Hóng Hung 1.40% 64.64%
Liào Liao 1.38% 66.02%
Hsu 1.33% 67.35%
Lài Lai 1.32% 68.66%
Zhōu Chou 1.24% 69.90%
Yeh 1.18% 71.08%
Su 1.17% 72.25%
Jiāng Chiang 0.97% 73.22%
Lu 0.94% 74.17%

For those wanting the Taiwanese (Hoklo) forms of these names, see Tailingua’s list of Common Family Names in Taiwan.

On the other hand, common given names have much greater variety in Taiwan than in America, especially in the case of males. In the United States the top 10 names for males cover 23.185 percent of the male population, and the top 10 names for females cover 10.703 percent of the population. In Taiwan, however, the top 10 given names (male and female together) cover just 1.49 percent of the population.

sources:

further reading:

critique of proposed guidelines for writing Taiwan place names

Several months ago I wrote about the move by Taiwan’s Ministry of the Interior (MOI) to impose Tongyong Pinyin by instituting standards for the writing of place names. (See MOI and Tongyong Pinyin: update). I was told that my remarks had been translated into Mandarin and distributed to those involved. But I have never received any response, despite more than one follow-up call. Although I never much expected to receive a useful response anyway, I had hoped for at least something.

Keep in mind that these are remarks aimed at those in the central government, who, at least for the time being, are compelled to work within the framework of Tongyong Pinyin. Also, I tried to stick as much as possible to the examples in the government’s draft, thus my use of “Jhuzih Hu,” which is both Tongyong Pinyin and a name whose word parsing is more complicated than most.

I have amended a few details, deleted some sections with personal details, and removed the conclusion, which was mainly polite blah-blah-blah.

I would welcome comments and suggestions for revisions.

Response to Taiwan’s Proposed Guidelines for Place Names in Romanization and English

As you are surely aware, Taiwan’s government has a very poor record when it comes to romanization. So the government now has an important opportunity to show Taiwan’s foreign community and others here who care about standards and are pained by the nation’s sloppiness in this regard that it is finally giving the issue the care it deserves. Unfortunately, the proposed guidelines in their present state would do little to improve the situation and in some cases could make things worse. Specifically, the proposed guidelines have seven basic problems.

  1. Failure to use Hanyu Pinyin
  2. Failure to use apostrophes correctly
  3. Failure to use hyphens correctly
  4. Partial failure to indicate individual words correctly
  5. Failure to handle non-Chinese names correctly
  6. Failure to consider instances where tone marks might be useful or even necessary
  7. Failure to fix old, misleading spellings

Before I give details about the problems listed above I would like to note that the guidelines are, however, correct in one important way: Place names should begin with a capital letter followed by lower-case letters. The Taipei City Government made an enormous mistake when it instituted the practice of adding extra capital letters where none are needed.

WRONG RIGHT
NanJing East Road Nanjing East Road
TianMu Tianmu
TaiNan Tainan

The Taipei City Government’s foolish policy of ExTra CaPiTal LettErs also helped bring about another major problem in Taipei: the omission of apostrophes before syllables beginning with a, e, and o. This will be addressed in my second point. But first comes the introductory one.

1. Failure to use Hanyu Pinyin

I know that the issue of Hanyu Pinyin vs. Tongyong Pinyin is not supposed to be on the table, so I do not expect any action to be taken on this for now. Nevertheless, I believe it necessary to remind the Ministry and those responsible for reviewing the guidelines that members of the international community — both within and outside of Taiwan — overwhelmingly support the adoption of Hanyu Pinyin for Mandarin and oppose the use of Tongyong Pinyin. There is simply no green/blue divide among foreigners on this issue; an overwhelming majority of “green” foreigners oppose Tongyong Pinyin and strongly support Hanyu Pinyin; and an overwhelming majority of “blue” foreigners feel the same way. For foreigners, this is a practical matter, not a political one.

The government’s insistence upon the use of Tongyong Pinyin has cost Taiwan respect and is having an impact on students’ choices of where to study Mandarin. Moreover, the lack of a consistent, correct, and internationalized romanization system considerably complicates Taiwan’s efforts to lure more tourists to the island. The government should abandon Tongyong Pinyin immediately, before it does any more harm. Too much time, money, and effort have been wasted already.

Nevertheless, some of the damage that has been done could be repaired if the government implements the best possible guidelines for the use of the romanization system it continues to insist upon. The proposed guidelines, however, are at best insufficient and thus are in need of significant revision.

This brings me to my main points.

2. Failure to use apostrophes correctly

The MOI guidelines correctly indicate that something is needed to distinguish syllables beginning with a, e, and o. But the MOI guidelines use the wrong method to indicate these breaks.

The MOI says that people should use a hyphen before syllables beginning with a, e, and o. This is a very bad idea. The correct way to do this is by using an apostrophe. Here is the rule Taiwan should adopt: “Put an apostrophe before any syllable that begins with a, e, or o, unless that syllable comes at the beginning of a word or immediately follows a hyphen or other dash.”

Table: Examples of how to write words that have inner syllables beginning with a, e, or o

WRONG RIGHT
Da-an Da’an
Su-ao Su’ao
Ren-ai Ren’ai

The main reason it is crucial not to use a hyphen in such places is that hyphens have other important uses, which I will discuss next.

3. Failure to use hyphens correctly

Hyphens are especially important when it comes to assigning names to places and things (especially things representing abbreviations and things that join two places).

WRONG RIGHT REASON
Suhua Expressway Su-Hua Expressway This road runs between Su‘ao and Hualian
Beiyi Expressway Bei-Yi Expressway This road runs between Taipei (Taibei) and Yilan. (And for heaven’s sake don’t make this “Pei-Yi.”)
Jianan dazun Jia-Nan dazun Jia-Nan refers to Jiayi and Tainan (嘉南大圳).
Huajiang Bridge Hua-Jiang Bridge The bridge joins Wanhua and Jiangzicui.
Sun Moon Lake Sun-Moon Lake These are joined elements.
Taida Tai-Da An abbreviation for Taiwan Daxue (台灣大學)

See https://pinyin.info/readings/texts/hyphens.html for details and additional ways that hyphens can help clarify Pinyin.

4. Partial failure to indicate individual words correctly

The guidelines are correct that there should be spaces between words (詞) but not between mere syllables (字). But the guidelines are too vague — and sometimes incorrect! — about how to determine what a word is (and thus what should be written separately).

Taiwan should use the guidelines that have already been worked out for these principles and have been accepted internationally. I am referring, of course, to the guidelines for Hanyu Pinyin, which are covered in general here — https://pinyin.info/rules/pinyinrules.html — and in detail in two books: Chinese Romanization: Pronunciation and Orthography (漢語拼音和正詞法) (ISBN 7-80052-148-6) and 新華拼寫詞典 (ISBN 7-100-03414-0). The latter book is sometimes available at the main Eslite bookstore near Taipei City Hall. The best Mandarin-English dictionary following these principles is the ABC Chinese-English Comprehensive Dictionary, edited by John DeFrancis; you should also use it as a standard reference.

Supporters of Tongyong Pinyin have often touted that system’s supposed “compatibility” with Hanyu Pinyin. Having the two systems share the same basic guidelines would be a good way to demonstrate that this is something more than empty words.

Most of the examples in the guidelines are correct. A few need revision.

WRONG RIGHT
Yangmingshan Yangming Shan
Jhuzihhu Jhuzih Hu [Zhuzi Hu]

5. Failure to handle non-Chinese names correctly

Just a few days ago President Chen Shui-bian (whose name, I note, is spelled in Hanyu Pinyin, not Tongyong Pinyin; but no one confuses him with the president of the People’s Republic of China!) was in Tainan County to mark the opening of some new roads around the Southern Taiwan Science Park. Each of the three roads has been given a name from an aboriginal language, something the president praised. Yet the government’s guidelines would force Mandarin upon the aboriginal names, changing them to something that would be incorrect.

Similarly, the administration has supported Aborigines regaining their original names and even villages reacquiring their original, non-Chinese names. (See, for example, http://news.yam.com/cna/garden/200708/20070801554267.html )

Ideally, no Chinese characters would be used with some of these names; but I don’t expect that to happen soon.

WRONG RIGHT
Kaidagelan Ketagalan
Tailuge Taroko
Sihmakusih (司馬庫斯) Smangus

Attention must also eventually be given to the issue of using Sinitic languages other than Mandarin (specifically Taiwanese and Hakka) in place names.

6. Failure to consider instances where tone marks might be useful or even necessary

Because Mandarin is a tonal language, a few names that are different may appear to be identical in romanization unless tone marks are included. In practice, only a very small percentage of names are subject to this ambiguity. Taipei, for example, has more than 600 different street names; but only the following would need attention there.

Chinese characters Pinyin and English mix
景華街 Jǐnghuá St.
景化街 Jǐnghuà St.
同安街 Tóng’ān St.
通安街 Tōng’ān St.
萬慶街 Wànqìng St.
萬青街 Wànqīng St.
五常街 Wǔcháng St.
武昌街 Wǔchāng St.
向陽路 Xiàngyáng Rd.
襄陽路 Xiāngyáng Rd.

For the benefit of foreigners and to aid clarity, tone marks should follow the practice of Hanyu Pinyin, not of Zhuyin Fuhao, i.e. first tone should be indicated (ā, ē, ī, ō, ū, and ǖ; not a, e, i, o, u, and ü). This is especially important because most names are written without tone marks; we should not get these confused with words that have only first-tone syllables, such as Tōng’ān (通安).

One possibility would be to tone marks on only the less common name(s). For example, we would write 五常街 as “Wǔcháng Street” but 武昌街 simply as “Wuchang Street” (rather than as “Wǔchāng Street“).

Some would advocate using tone marks on most if not all signage with Pinyin. This deserves study.

7. Failure to fix old, misleading spellings

Several years ago when the central government promulgated Tongyong Pinyin it kept the old spellings for some cities and all counties (other than Yilan, which changed from “Ilan”). This was a mistake. The old spellings are inherently ambiguous in pronunciation and are often quite simply misleading.

The government should end the policy of retaining most old spellings. Quite simply, there is nothing useful to foreigners or anyone else about retaining, for example, “Taitung” for what should be spelled “Taidong.” A limited, practical approach for the time being would be to immediately change all names that are spelled the same way in Tongyong Pinyin and Hanyu Pinyin, with the possible exception of retaining “Taipei” instead of switching to “Taibei.”

WRONG RIGHT
Taitung Taidong
Matsu Mazu
Kinmen Jinmen
Hualien Hualian
Chiayi Jiayi
Pingtung Pingdong
Keelung Jilong

New story in Pinyin: ‘Dashui Guohuo’ (‘After the Flood’)

I’m very pleased to announce that Pinyin Info has a new story in Hanyu Pinyin: “Dàshuǐ Guòhòu,” by Zhang Liqing. It’s available here in two versions: Pinyin alone and Pinyin with English translation (as “After the Flood”), so one doesn’t even have to know Mandarin or Pinyin to read this.

The story recalls a girlhood friend in China, not long after the end of the Second World War.

Zhang is an associate editor of the ABC Chinese-English Comprehensive Dictionary and has translated a number of important works into English, including Zhou Youguang’s The Historical Evolution of Chinese Languages and Scripts and Lü Shuxiang’s Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.

Tonally Orthographic Pinyin

Tonally Orthographic Pinyin (TOP) is a modification of Hanyu Pinyin that uses capitalization practices to distinguish between the various tones of Mandarin.

This can mess with the capitalization found at the beginnings of sentences and proper nouns, so I have mixed feelings about it. But many find TOP useful as a learning tool and in writing text messages.

Here’s how TOP’s creator, Terry Thatcher Waltz, describes the system:

FIRST TONES ARE WRITTEN IN ALL CAPS. YOUR VOICE IS HIGH.

seconD toneS arE writteN witH thE lasT letteR capitalizeD. that’S becausE youR voicE haS tO risE.

third tones are written all lower case. that’s because the voice is low. (let’s keep discussions on the true nature of third and half-third tones somewhere else — this system is just to help us poor foreigners internalize tones!)

Fourth Tone Has The First Letter Of Each Word Capitalized, Because Your Voice Starts High And Then Falls Downward.

Thus, the phrase “wǒ měitiān liànxí Hànyǔ” would be written “wo meiTIAN LianxI Hanyu” in TOP.

See the first link below for details.

further reading:

Taipei MRT stations — a list giving Hanyu Pinyin with tone marks

outline map of the Taipei MRT systemWhen Taipei’s MRT system — which is mainly a subway system but which also has elevated portions and even sections at ground level — opened, most of its signage was in bastardized Wade-Giles, with the “English” pronunciation of the station names broadcast in the cars resembling a hideous parody of the speech of an especially clueless foreign visitor. Fortunately, the romanization was switched to Hanyu Pinyin and the English announcements were re-recorded to give pronunciations that much more accurately reflected the Mandarin station names.

Unfortunately, English announcements have been added in recent months that feature a high-pitched voice that is probably intended to be ke’ai (“cute”) but which is actually cloying. These must die, die, die! But I’m straying from the main topic.

Anyway, the MRT’s current signage, nicely designed as most of it is, does not give any tone marks. Nor does it provide Pinyin for the station names that are translated into English. And there are also a few mistakes that really need to be corrected in the official forms of the names.

So, I have updated and added some minor corrections to the lists I put up long ago on my first Web site, Romanization.com. The new versions, here on Pinyin Info, are here: Taipei MRT stations in Chinese characters, Hanyu Pinyin, and some English.