Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’?n sorts before p?ny?n, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  1. sh?shi
  2. sh?sh?
  3. sh?shí
  4. sh?sh?
  5. sh?shì
  6. shísh?
  7. shíshì
  8. sh?sh?
  9. shìsh?

Irrespective of tones, entries with the vowel u precede those with ü.
For example:

  1. l?
  2. l?
  3. l?
  4. l?
  1. n?

Entries without apostrophe precede those with apostrophe. For example:

  1. biànargue
  2. b?’ànthe other shore

Lower-case entries precede upper-case entries. For example:

  1. hòujìnaftereffect
  2. Hòu JìnLater Jin dynasty

For entries with identical spelling, including tones, arrangement is by order of frequency….

For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is not? á ? à a,” but “a ? á ? à.” And, because lowercase comes before uppercase, notA a ? ? Á á ? ? À à” but “a A ? ? á Á ? ? à À.

One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

The ABC series follows the example of the Hanyu Pinyin Cihui (?????? / Hàny? P?ny?n Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìg?ng sorted before l?-g?ng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting l?-g?ng before lìg?ng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, 1míng-àn ?? and 2míng’àn ?? are treated as homophones, and they sort after m?ng?n ??.

How to handle ‘de’ and interjections in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????) deals with how to write Mandarin’s various de‘s, mood particles, and interjections.

This reading is available in two versions:

  • simplified Chinese characters: ???? ????? (zhùcí, tàncí)
  • traditional Chinese characters: ???? ?????

I’ve already written about the principles in previous posts. For example, see

How to write numbers and measure words in Hanyu Pinyin

cover image for the bookToday’s selection from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????) is about writing numbers and measure words.

This reading is available in two versions:

For more on this, see these posts and the PDFs linked to therein.

How to write verbs in Hanyu Pinyin (Mandarin text)

cover image for the book

Here’s the first of several selected readings from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????). It covers the writing of verbs.

This reading is available in two versions:

  • simplified Chinese characters: ???? ??
  • traditional Chinese characters: ???? ??

For those who would like to read about this in English, see

important book on Pinyin to be excerpted on this site

cover image for the bookX?nhuá P?nxi? Cídi?n (???????? / ????????), is the second of Yin Binyong’s two books on Pinyin orthography. The first, Chinese Romanization: Pronunciation and Orthography, is in English and Mandarin; much of it is already available here on Pinyin.Info.

Although Xinhua Pinxie Cidian is only in Mandarin, the large number of examples makes it easy to get the point even if you may not read Mandarin in Chinese characters very well.

This week I will begin posting some excerpts from this invaluable work. What’s more, I have made a version in traditional Chinese characters, which I hope that readers in Taiwan, Hong Kong, and elsewhere will take advantage of. So those not used to reading simplified Chinese characters will have a choice (which is more than the government of Taiwan is providing these days).

I’m extremely happy to be able to bring you this information and with to acknowledge the generosity of the Commercial Press. Stay tuned.

? vs. a

image of the rounded 'a' and the normal 'a' with the example given of the word 'Hanyu' (with tone marks)About a year ago (which is roughly how overdue this post is), a commenter noted that some Chinese publishers “are convinced that Pinyin must be printed with ? (single-story „Latin alpha“, as opposed to double-story a), and with ? (single story; not double story g).”

But does Hanyu Pinyin in fact call for this longstanding Chinese habit of bad typography? This was one of the first questions I asked of Zhou Youguang, the father of Hanyu Pinyin, when I met with him: Are those who insist upon the ?-style letter correct?

“Oh, no,” Zhou replied. “That ‘?’ is just for babies!” And he laughed that wonderful laugh of his that no doubt has contributed to his remarkable longevity.

Zhou was referring to the facts that the “?” style of letter is usually found specifically in books for infants … and that this style generally does not belong elsewhere. In fact, ? and ? (written thusly) are often referred to as infant characters. A variant of the letter y is sometimes included in this set.

Letters in that style are also found in the West — but almost always in books for toddlers, and often not even in those. Furthermore, even in those cases the use of such letters appears to have no positive effect on children’s reading.

The correct-style letters for Pinyin are the same as those for English, Zhou stated.

I hope that anyone who has been using “?” will both officially and in practice switch to “a”. It’s long past time that the supposed rule calling for “?” was treated as a dead letter.

Long live good typography!

-r endings, their pronunciation, and Pinyin spelling

cover of Chinese Romanization: Pronunciation and OrthographyArrr! In recognition of International Talk Like a Beijinger Pirate Day, here are the rules for how to spell those -r endings in Hanyu Pinyin and how those endings affect the pronunciation of syllables. In many cases, it’s more complicated than just adding an -r sound at the end of the standard syllable.

This information is from Yin Binyong’s Chinese Romanization: Pronunciation and Orthography. The full section from this book is available in PDF form: r- Suffixed Syllables.

Written form Actual pronunciation
-ar (m?r, horse) -ar (m?r)
-air (gàir, lid) -ar (gàr)
-anr (pánr, plate) -ar (pár)
-aor (b?or, bundle) -aor (b?or)
-angr (g?ngr, jar) -ãr (gãr)
-or (mòr, dust) -or (mòr)
-our (hóur, monkey) -our (hóur)
-ongr (chóngr, insect) -õr (chõr)
-er (g?r, song) -er (g?r)
-eir (bèir, back) -er (bèr)
-enr (ménr, door) -er (mér)
-engr (d?ngr, lamp) -?r (d?r)
-ir* (zìr, Chinese character) -er (zèr)
-ir (m?r, rice) -ier (m?er)
-iar (xiár, box) -iar (xiár)
-ier (diér, saucer) -ier (diér)
-iaor (ni?or, bird) -iaor (ni?or)
-iur (qiúr, ball) -iour (qióur)
-ianr (di?nr, bit) -iar (di?r)
-iangr (qi?ngr, tune) -iãr (qiãr)
-inr (x?nr, core) -ier (x?er)
-ingr (língr, bell) -i?r (li?r)
-iongr (xióngr, bear) -iõr (xiõr)
-ur (tùr, rabbit) -ur (tùr)
-uar (hu?r, flower) -uar (hu?r)
-uor (huór, work) -uor (huór)
-uair (kuàir, piece) -uar (kuàr)
-uir (shu?r, water) -uer (shu?r)
-uanr (wánr, to play) -uar (wár)
-uangr (ku?ngr, basket) -uãr (kuãr)
-unr (lúnr, wheel) -uer (luér)
-ür (q?r, song) -üer (q?er)
-üer (juér, peg) -üer (juér)
-üanr (qu?nr, loop) -üar (qu?r)
-ünr (qúnr, skirt) -üer (quér)

Notes:

  • ã, õ, ? indicate nasalized a, o, e.
  • The -i marked with an asterisk indicates either of the apical vowels that follow zh, ch, sh, r and z, c, s.