Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’?n sorts before p?ny?n, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  1. sh?shi
  2. sh?sh?
  3. sh?shí
  4. sh?sh?
  5. sh?shì
  6. shísh?
  7. shíshì
  8. sh?sh?
  9. shìsh?

Irrespective of tones, entries with the vowel u precede those with ü.
For example:

  1. l?
  2. l?
  3. l?
  4. l?
  1. n?

Entries without apostrophe precede those with apostrophe. For example:

  1. biànargue
  2. b?’ànthe other shore

Lower-case entries precede upper-case entries. For example:

  1. hòujìnaftereffect
  2. Hòu JìnLater Jin dynasty

For entries with identical spelling, including tones, arrangement is by order of frequency….

For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is not? á ? à a,” but “a ? á ? à.” And, because lowercase comes before uppercase, notA a ? ? Á á ? ? À à” but “a A ? ? á Á ? ? à À.

One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

The ABC series follows the example of the Hanyu Pinyin Cihui (?????? / Hàny? P?ny?n Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìg?ng sorted before l?-g?ng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting l?-g?ng before lìg?ng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, 1míng-àn ?? and 2míng’àn ?? are treated as homophones, and they sort after m?ng?n ??.

5 thoughts on “Pinyin sort order

  1. How are homophones with the same Pinyin organized in ABC English-Chinese Dictionary?

    “ 1. ài ? (10?)
    2. ài ? (10?)
    3. ài ? (13?)
    4. ài ? (5?)
    5. ài ? (12?)
    6. ài ?(13?) “

  2. More specifically: “For entries with identical spelling, including tones, arrangement is by order of frequency, indicated by a raised number before the transcription, a device adapted from Western lexicographic practice to distinguish homonyms. In the case of monosyllabic entries, our frequency order is based largely on Xiandai Hanyu Pinlü Cidian. In the case of entries of more than one syllable, we have also made use of Zhongwen Shumianyu Pinlü Cidian. For entries not found in either work, we have made subjective judgments of relative frequency. For entries that are homographic if tones are disregarded, the item of highest frequency is indicated by an asterisk following the transcription.”

  3. my google Chrome browswer can’t detect which coding I should use in order to read the discussion and comments.
    please let me know.thx!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>