Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’ān sorts before pīnyīn, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  • shīshi
  • shīshī
  • shīshí
  • shīshǐ
  • shīshì
  • shíshī
  • shíshì
  • shǐshī
  • shìshī
  • Irrespective of tones, entries with the vowel u precede those with ü.
    For example:

    Entries without apostrophe precede those with apostrophe. For example:

    1. biànargue
    2. bǐ’ànthe other shore

    Lower-case entries precede upper-case entries. For example:

    1. hòujìnaftereffect
    2. Hòu JìnLater Jin dynasty

    For entries with identical spelling, including tones, arrangement is by order of frequency….

    For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is notā á ǎ à a,” but “a ā á ǎ à.” And, because lowercase comes before uppercase, notA a Ā ā Á á Ǎ ǎ À à” but “a A ā Ā á Á ǎ Ǎ à À.”

    One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

    The ABC series follows the example of the Hanyu Pinyin Cihui (汉语拼音词汇 / 漢語拼音詞彙 / Hànyǔ Pīnyīn Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

    HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìgōng sorted before lǐ-gōng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting lǐ-gōng before lìgōng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, ¹míng-àn 明暗 and ²míng’àn 冥暗 are treated as homophones, and they sort after mǐngǎn 敏感.

    Oxford Chinese Dictionary goes online

    cover image of the Oxford Chinese DictionaryOxford University Press has just announced that its massive Oxford Chinese Dictionary is now available through its Oxford Language Dictionaries Online subscription service.

    I haven’t seen the online version yet myself; but from the publisher’s description it appears to be largely the same as the published edition, whose paucity of Pinyin is disappointing. The publisher, however, is promising that “Pinyin will be added to all Chinese translations” in November, which should be a major step forward.

    Perhaps some of you at universities have institutional access. I would welcome reports.

    source: What’s New, Oxford Language Dictionaries Online, May 2011.

    Find Chinese characters online by drawing them with your mouse

    Nciku, a Web site that bills itself as “more than a dictionary,” has a nifty feature that allows users to find Chinese characters by drawing them with a mouse.

    interface for the character-drawing tool

    As you draw, possible character matches will appear in the box to the right of your drawing, with the results refined as your drawing progresses. You don’t need to know the canonical stroke order to get this to work, nor do your calligraphy skills need to be perfect, as this example shows.
    , showing the results with a sloppily drawn ? (the 'pin' of 'Pinyin')

    Once you see the correct character offered as a choice, click on it and it will be entered into the search box for the site’s online dictionary. This dictionary feature can handle multiple-character input and will even prompt you with likely choices to fill out your search.

    via Keywords

    massive Korean dictionary of Chinese characters nears completion

    The final volumes in what is being touted as the world’s largest Chinese character dictionary are scheduled to be published in May.

    The fifteen-volume work (excluding the index) will reportedly cover some 60,000 Chinese characters and include about 500,000 Sinitic words. By comparison, the Zhongwen da cidian (中文大辭典 / Zhōngwén dà cídiǎn), published in Taiwan in the 1960s covers 49,905 Chinese characters.

    The project was initiated by the Institute of Oriental Studies of Dankook University, South Korea, in 1978.

    The first volume of the 『漢韓大辭典 』 (in Mandarin: Hàn-Hán dà cídiǎn; “Dictionary of Chinese characters Korean use,” as it is translated on the institute’s Web site) was issued in 1999. Last year, volumes 10-12 were published.

    The project has reportedly cost more than W20 billion (US$21.3 million).

    Yet more work may still be needed.

    Prof. Kim Eon-jong of the Department of Korean Literature in Classical Chinese at Korea University said, “This project has great significance from the standpoint of cultural history. But it’s a pity that the institute hastened the final stage. It must complement and supplement the dictionary later.”

    sources: