Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’?n sorts before p?ny?n, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  1. sh?shi
  2. sh?sh?
  3. sh?shí
  4. sh?sh?
  5. sh?shì
  6. shísh?
  7. shíshì
  8. sh?sh?
  9. shìsh?

Irrespective of tones, entries with the vowel u precede those with ü.
For example:

  1. l?
  2. l?
  3. l?
  4. l?
  1. n?

Entries without apostrophe precede those with apostrophe. For example:

  1. biànargue
  2. b?’ànthe other shore

Lower-case entries precede upper-case entries. For example:

  1. hòujìnaftereffect
  2. Hòu JìnLater Jin dynasty

For entries with identical spelling, including tones, arrangement is by order of frequency….

For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is not? á ? à a,” but “a ? á ? à.” And, because lowercase comes before uppercase, notA a ? ? Á á ? ? À à” but “a A ? ? á Á ? ? à À.

One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

The ABC series follows the example of the Hanyu Pinyin Cihui (?????? / Hàny? P?ny?n Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìg?ng sorted before l?-g?ng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting l?-g?ng before lìg?ng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, 1míng-àn ?? and 2míng’àn ?? are treated as homophones, and they sort after m?ng?n ??.

Remembering Y.R. Chao: 1892-1982

Y.R. Chao

Y.R. Chao
November 3, 1892 – February 25, 1982

Today, the thirtieth anniversary of the death of the brilliant linguist and all-around interesting guy Y.R. Chao (Zhao Yuanren / Zhào Yuánrèn / ??? / ???), I’m remembering him by rereading some of his work. (Chao died twenty years and one day after his good friend Hu Shih.)

Here are some readings here on Pinyin.info by or about Y.R. Chao that you may wish to review:

New database of cross-strait differences in Mandarin goes online

Last week, on the same day President Ma Ying-jeou accepted the resignation of a minister who made some drunken lewd remarks at a w?iyá (year-end office party), Ma was joking to the media about blow jobs.

Classy.

screenshot from a video of a news story on this

But it was all for a good cause, of course. You see, the Mandarin expression chu? l?ba, when not referring to the literal playing of a trumpet, is usually taken in Taiwan to refer to a blow job. But in China, Ma explained, chu? l?ba means the same thing as the idiom p?i m?pì (pat/kiss the horse’s ass — i.e., flatter). And now that we have the handy-dandy Zh?nghuá Y?wén Zh?shikù (Chinese Language Database), which Ma was announcing, we can look up how Mandarin differs in Taiwan and China, and thus not get tripped up by such misunderstandings. Or at least that’s supposed to be the idea.

The database, which is the result of cross-strait cooperation, can be accessed via two sites: one in Taiwan, the other in China.

It’s clear that a lot of money has been spent on this. For example, many entries are accompanied by well-documented, precise explanations by distinguished lexicographers. Ha! Just kidding! Many entries are really accompanied by videos — some two hundred of them — of cutesy puppets gabbing about cross-strait differences in Mandarin expressions. But if there’s a video in there of the panda in the skirt explaining to the sheep in the vest that a useful skill for getting ahead in Chinese society is chu? l?ba, I haven’t found it yet. Will NMA will take up the challenge?

Much of the site emphasizes not so much language as Chinese characters. For example, another expensively produced video feeds the ideographic myth by showing off obscure Hanzi, such as the one for ch?ng.

WARNING: The screenshot below links to a video that contains scenes with intense wawa-ing and thus may not be suitable for anyone who thinks it’s not really cute for grown women to try to sound like they’re only thwee-and-a-half years old.

cheng3

In a welcome bit of synchronicity, Victor Mair posted on Language Log earlier the same week on the unpredictability of Chinese character formation and pronunciation, briefly discussing just such patterns of duplication, triplication, etc.

Mair notes:

Most of these characters are of relatively low frequency and, except for a few of them, neither their meanings nor their pronunciations are known by persons of average literacy.

Many more such characters consisting or two, three, or four repetitions of the same character exist, and their sounds and meanings are in most cases equally or more opaque.

The Hanzi for ch?ng (which looks like ??? run together as one character) in the video above is sufficiently obscure that it likely won’t be shown correctly in many browsers on most systems when written in real text: ????. But never fear: It’s already in Unicode and so should be appearing one of these years in a massively bloated system font.

Further reinforcing the impression that the focus is on Chinese characters, Liú Zhàoxuán, who is the head of the association in charge of the project on the Taiwan side, equated traditional Chinese characters with Chinese culture itself and declared that getting the masses in China to recognize them is an important mission. (Liu really needs to read Lü Shuxiang’s “Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.”)

Then he went on about how Chinese characters are a great system because, supposedly, they have a one-to-one correspondence with language that other scripts cannot match and people can know what they mean by looking at them (!) and that they therefore have a high degree of artistic quality (g?odù de yìshùxìng). Basically, the person in charge of this project seems to have a bad case of the Like Wow syndrome, which is not a reassuring trait for someone in charge of producing a dictionary.

The same cooperation that built the Web sites led to a new book, Li?ng’àn M?irì Y? Cí (???????? / Roughly: Cross-Strait Term-a-Day Book), which was also touted at the press conference.

The book contains Hanyu Pinyin, as well as zhuyin fuhao. But, alas, the book makes the Pinyin look ugly and fails completely at the first rule of Pinyin: use word parsing. (In the online images from the book, such as the one below, all of the words are se pa ra ted in to syl la bles.)

The Web site also has ugly Pinyin, with the CSS file for the Taiwan site calling for Pinyin to be shown in SimSun, which is one of the fonts it’s better not to use for Pinyin. But the word parsing on the Web site is at least not always wrong. Here are a few examples.

  • “???” is given as p?oshénr (good).
  • And apostrophes appear to be used correctly: e.g., fàn’?n (??), ch?n’?n (??), and f?i’?n (??).
  • But “???” is run together as “dìèrch?n” (no hyphen) rather than as shown correctly as dì-èr ch?n.
  • And “??????” is given as yí?e tóu li?n??e dà (for Taiwan) and y??e tóu li?n??e dà (for China). But ge is supposed to be written separately. (The variation of tone for yi is in this case useful.)

Still, my general impression from this is that we should not expect the forthcoming cross-strait dictionary to be very good.

Further reading:

A quarter century of Sino-Platonic Papers

These days, with the click of a mouse one can publish something that can instantly be seen by people around the world. But despite this ease it can still feel like a major accomplishment if someone has the tenacity to keep even a blog going past its first few years.

Consider, then, the days long before user-friendly blogging software, the days before blogs even. The days before desktop publishing was in the hands of more than a few, before most people had the ability to send or receive files electronically, before most people had even heard of the Internet. The days when typewriters were still common.

So these were also the days so long before Unicode that including Chinese characters or even common diacritics in a manuscript usually meant writing them in by hand.

The days when small-scale publishing meant trips to the copy shop and long sessions spent photocopying and stapling. When the international correspondence needed to issue a small journal meant trip after trip to the post office, paying postage to send something to what might well be the other side of the globe, and having to wait weeks, months, for a reply.

The days when receiving payment for issues meant paper checks sent through the regular mail and then taken during certain hours to the bank, where you would wait in line for a teller. And heaven help you with the endless paperwork and waiting if the check was not in U.S. dollars but a foreign currency.

The days when long-distance phone calls really cost something. And international calls? Ouch!

And all that’s on top of all of the other many challenges involved in running a peer-reviewed academic journal.

Those are just some of the situations Victor Mair had to deal with when his journal, Sino-Platonic Papers, was getting off the ground. And there have certainly been many challenges since.

So I think it’s worth noting that Sino-Platonic Papers has reached the age of twenty-five and is still going strong.

There are now more than two hundred issues, the majority of which are available in full for free on Sino-Platonic Papers’ Web site. The shortest issue is just four pages, while the longest to date stretches over three volumes and comprises approximately one thousand pages.

That this journal has published all manner of authors, from internationally renowned scholars to unaffiliated researchers out in the boondocks, helps demonstrate its willingness to take risks. (But, as Cameron Crowe reminds us, that’s how you become great.)

Sino-Platonic Papers has just released its thirteenth volume of book reviews (many of which are particular favorites of mine). But what is especially notable is that it marks the twenty-fifth anniversary of the beginning of this wide-ranging journal.

I congratulate SPP‘s editor, Victor Mair, on this milestone.

Here’s what the anniversary issue covers.

  • Preface
  • Ancient China and Its Enemies: The Rise of Nomadic Power in East Asian History by Nicola Di Cosmo
  • The Prehistory of the Silk Road by E. E. Kuzmina, ed. Victor H. Mair
  • Mozi: A Complete Translation by Ian Johnston
  • Envisioning Eternal Empire: Chinese Political Thought of the Warring States Era by Yuri Pines
  • The Politics of Mourning in Early China by Miranda Brown
  • The Revelation of the Magi: The Lost Tale of the Wise Men’s Journey to Bethlehem by Brent Landau
  • A Story Waiting to Pierce You: Mongolia, Tibet and the Destiny of the Western World by Peter Kingsley
  • Rome and China: Comparative Perspectives on Ancient World Empires, ed. Walter Scheidel
  • The Camel’s Load in Life and Death: Iconography and Ideology of Chinese Pottery Figurines from Han to Tang and Their Relevance to Trade along the Silk Routes by Elfriede Regina Knauer
  • Ethnic Identity in Tang China by Marc Abramson
  • Mélange tantriques à la mémoire de Hélène Brunner/Tantric Studies in Memory of Hélène Brunner, ed. Dominic Goodall and André Padoux
  • Imperial China, 900-1800 by F. W. Mote
  • Local Religion in North China in the Twentieth Century: The Structure and Organization of Community Rituals and Beliefs by Daniel L. Overmyer
  • Tibetan Market Participation in China by Wang Shiyong
  • Chinese as It Is: A 3D Sound Atlas with First 1000 Characters by Conal Boyce
  • Language Choice and Identity Politics in Taiwan by Jennifer M. Wei
  • ABC English-Chinese, Chinese-English Dictionary, ed. John DeFrancis and Zhang Yanyin
  • Learning Chinese, Turning Chinese: Challenges to Becoming Sinophone in a Globalized World, by Edward McDonald

Disclaimer: I volunteer as SPP’s technical editor and maintain its Web site. But I certainly didn’t have any such position twenty-five years ago!

Wenlin releases major upgrade (4.0)

Wenlin logoOne of my favorite programs, Wenlin (which bills itself as “software for learning Chinese”), has just released a major upgrade for both Mac and Windows versions. This doesn’t happen often; it has been three-and-a-half years since the most recent big change was issued (Wenlin 3.4) and heaven only knows how long since 3.0 came out. So, yes, this release has many substantial improvements.

One of the features nearest and dearest to my heart is that Wenlin 4.0 features greatly improved handling of Pinyin. I was among the field testers for the new version, so I’ve already spent a lot of time examining this feature. Here are a few important aspects of this:

  • Conversions from Chinese characters follow Hanyu Pinyin orthography much more closely than before. This is a major change for the better. (There’s still some room for improvement. But I don’t think we’ll have to wait years for this.)
  • In the past, using Wenlin to convert long texts in Chinese characters into Pinyin could be a real chore, with users having to examine example after example of Chinese characters with multiple pronunciations in order to select the proper pronunciation for that particular context. But now users may, if they so desire, tell Wenlin not to ask users for disambiguation input. Of course, that doesn’t mean that Wenlin will always guess right; but many users will be happy that this trade-off allows them to skip the frustration of, for example, having to tell the program over and over and over that, yes, in this case ? is pronounced shu? rather than shuì.
  • Relative newcomers to Mandarin may appreciate that for common words tone sandhi is indicated in Wenlin with additional marks (a dot or line below the vowel). This feature can also be turned off, for those who want standard Pinyin.

There are, of course, many improvements beyond the area of Pinyin. Here are a few:

  • One limitation of Wenlin 3.x was that its English dictionary wasn’t very large. But Wenlin 4.0 includes not only the ABC Chinese-English Comprehensive Dictionary but also the excellent new ABC English-Chinese, Chinese-English Dictionary (now finally in stock in the printed version).
  • The flashcards are now set up to handle not just individual characters but polysyllabic words.
  • There’s full Unicode Unihan 6.0 support for more than 75,000 Chinese characters.
  • And for those who think 75,000 just isn’t enough, users can now access Wenlin’s CDL technology. Through this, users can create new, variant, and rare characters; moreover, these can be published and shared with other Wenlin users or CDL-friendly devices.
  • Seal script versions of more than 11,000 characters are provided.
  • Wenlin contains an e-edition of the Shuowen Jiezi (Shu?wén Ji?zì / ???? / ????).
  • Coders will be interested to know that Wenlin appears to be headed toward becoming open-source.
  • Both Mandarin and English entries are marked with grade levels, which aids learners by indicating relative frequency of use. The levels for Mandarin words are based on the Hanyu Shuiping Kaoshi (Hàny? Sh?ipíng K?oshì / ?????? / ?????? / HSK).

The full version (i.e., the CD with the program comes in a box and is likely packaged with a hard copy of the manual) is US$199, or US$179 if you download it from the Wenlin Web store. Upgrades from 3.x cost US$49.

For more information, see the summary of features and outline of what’s new in Wenlin 4.0.

screenshot from Wenlin 4.0 -- click for larger version

Xin Tang no. 1: articles in Gwoyeu Romatzyh

click to view the PDFI’ve just put up another issue of Xin Tang.

As you may have noticed already, the name on the cover is given not as Xin Tang but as Shin Tarng. That’s because the journal started out being published in the Gwoyeu Romatzyh romanization system. But using the Hanyu Pinyin spelling here helps me keep track of these better.

Almost all of this issue is in Mandarin written in Gwoyeu Romatzyh. One article also has an en face translation into English. And as is the case with the other issues of Xin Tang, a variety of topics are covered.

Shin Tarng no. 1 (September/Ji?yuè 1982)

ABC English-Chinese, Chinese-English Dictionary out soon

front cover of the ABC English-Chinese, Chinese-English DictionaryThe ABC Chinese-English Dictionary was published ten years ago. It was revolutionary in that, for the first time, a Mandarin-English dictionary was ordered entirely by the headwords’ pronunciation as written in pinyin. (Stroke and radical indexes are also there to aid finding a character when its shape is known but not its pronunciation.) Other dictionaries in the DeFrancis ABC series have followed. But up to now there been no ABC dictionary with an English to Mandarin section as well as a Mandarin to English one.

At the end of this month the University of Hawai`i Press is releasing the ABC English-Chinese, Chinese-English Dictionary. The new dictionary, which is 1,252 pages long, has 29,670 entries in its English-Mandarin section and 37,963 entries for Mandarin-English (total 67,633 entries). (The much larger ABC Chinese-English Comprehensive Dictionary has some 196,000 entries — all Mandarin-English).

This is a big year for Mandarin-English dictionaries, with the forthcoming release of the ABC ECCE and the release three months ago of the massive Oxford Chinese Dictionary. From the standpoint of Pinyin, however, the Oxford dictionary is a disappointment. For example, the Oxford dictionary has no Pinyin in the English-Mandarin section, just Chinese characters; in some other places tone marks are missing from some of the Pinyin, where it appears at all. Perhaps this will be rectified in the online edition, which has yet to appear. At the moment, though, the Oxford looks like a fairly traditional dictionary — albeit a huge one — aimed mainly at English learners in China, which isn’t necessarily a bad thing if you happen to be among that very large group of people. For more on the Oxford, see the video at Danwei and the entries at Chinese Forums (with some images) and Language Log.

Unlike the Oxford dictionary, the ABC ECCE offers both Pinyin and Chinese characters for all entries and sample sentences. (See samples below. Click on those for more extensive examples in PDF files.)

From what I’ve seen so far of the ABC English-Chinese, Chinese-English Dictionary, I expect it to become the dictionary for English-speaking students of Mandarin. I’ll write more about this once I’m able to see a hard copy.

The ABC English-Chinese, Chinese-English Dictionary retails for only US$20, compared to US$75 for the Oxford.

From the Mandarin-English section. But don’t expect the text in the printed edition to be this large. I’ve enlarged the image to make it easier to read on the Web.
examples of entries in the Mandarin-English section of the ABC English-Chinese, Chinese-English Dictionary

From the English-Mandarin section:
examples of entries in the English-Mandarin section of the ABC English-Chinese, Chinese-English Dictionary

(ISBN-10: 0824834852; ISBN-13: 978-0824834852)

See also:

Xin Tang 6

cover of Xin Tang, no. 6My previous post linked to a new HTML version of Homographobia, an essay by John DeFrancis. The work was first published in November 1985, in the sixth issue of Xin Tang (New China).

Xin Tang (X?n Táng) is an especially interesting journal in that it is primarily in Mandarin written in romanization. A variety of romanization systems and methods are employed over the course of the journal. Indeed, over the course of its run one can see many questions of systems and orthographies being worked out.

I want to stress, though, that the journal does not restrict itself to material of interest only to romanization specialists. It also features poetry, illustrated stories, philosophy, letters to the editor, children’s material, and much more.

English and a few Chinese characters are also found; and there are even articles in languages such as Turkish (with Mandarin and English translations).

Most of what appears in English is also translated into Mandarin — romanized Mandarin, of course. So DeFrancis’s essay also appears, appropriately, in Pinyin:

Homographobia is a disorder characterized by an irrational fear of ambiguity when individual lexical items which are now distinguished graphically lose their distinctive features and become identical if written phonemically. The seriousness of the disorder appears to be in direct proportion to the increase in number of items with identical spelling that phonemic rendering might bring about….

Tongyinci-kongjuzheng shi yi zhong xinli shang d shichang, tezheng shi huluande haipa yong pinyin zhuanxie dangqing kao zixing fende hen qingchu d cir hui shiqu tamend bianbiexing. Kan qilai, zhei ge bing d yanzhongxing gen pinyin shuxie keneng zaocheng d tongxing pinshi shuliang d zengjia cheng zhengbi….

All of the issue with the DeFrancis essay is now online: Xin Tang no. 6.

illustration of a dragon reading a copy of Xin Tang, from an illustrated story
Note the occasional employment of a tonal spelling (shuui).