Aiyo! OED fails to use Pinyin for some new entries

The Oxford English Dictionary has just added some new entries, including several from Sinitic languages.

A lot of these come by way of Singapore and so reflect the Hokkien language. For example, among the new entries is “ang pow,” which is Hokkien’s equivalent of Mandarin’s “hongbao,” which also made the list.

A few of the entries, however, come from Mandarin, for example two common interjections for surprise. Oddly, though, the OED uses “aiyoh” and “aiyah” instead of their proper Pinyin spellings of “aiyo” and “aiya.”

“Ah,” you say, “but maybe the aiyoh and aiyah spellings are more common in English.”


Even in Singapore domains (.sg), the Pinyin spellings are more common than those the OED calls for. As the tables below show, in every instance the Pinyin spellings are also more common in Hong Kong, China, and Taiwan. Throughout the world, the Pinyin spellings are more common — the vast majority of the time by a factor of at least two.

Google search results for “aiyo” (Pinyin) and “aiyoh” (spelling used in the OED)

  aiyo aiyoh
.sg 12,200 5,680
.hk 2,570 187
.cn 6,040 984
.tw 4,690 196
all domains 1,250,000 137,000
all domains  + “chinese” 97,700 77,100
all domains  + “mandarin” 51,800 14,100

Google search results for “aiya” (Pinyin) and “aiyah” (spelling used in the OED)

  aiya aiyah
.sg 17,600 8,310
.hk 6,400 2,360
.cn 13,200 1,860
.tw 5,910 1,710
all domains 3,370,000 332,000
all domains  + “chinese” 238,000 63,200
all domains  + “mandarin” 36,500 22,800

Searching Google Books also reveals that the Pinyin forms are more common.

In short, I do not see any good reason for the OED to have adopted ad hoc spellings rather than the Pinyin standard. They must have their reasons, but it looks like they botched this.

Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’ān sorts before pīnyīn, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  • shīshi
  • shīshī
  • shīshí
  • shīshǐ
  • shīshì
  • shíshī
  • shíshì
  • shǐshī
  • shìshī
  • Irrespective of tones, entries with the vowel u precede those with ü.
    For example:

    Entries without apostrophe precede those with apostrophe. For example:

    1. biànargue
    2. bǐ’ànthe other shore

    Lower-case entries precede upper-case entries. For example:

    1. hòujìnaftereffect
    2. Hòu JìnLater Jin dynasty

    For entries with identical spelling, including tones, arrangement is by order of frequency….

    For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is notā á ǎ à a,” but “a ā á ǎ à.” And, because lowercase comes before uppercase, notA a Ā ā Á á Ǎ ǎ À à” but “a A ā Ā á Á ǎ Ǎ à À.”

    One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

    The ABC series follows the example of the Hanyu Pinyin Cihui (汉语拼音词汇 / 漢語拼音詞彙 / Hànyǔ Pīnyīn Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

    HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìgōng sorted before lǐ-gōng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting lǐ-gōng before lìgōng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, ¹míng-àn 明暗 and ²míng’àn 冥暗 are treated as homophones, and they sort after mǐngǎn 敏感.

    New database of cross-strait differences in Mandarin goes online

    Last week, on the same day President Ma Ying-jeou accepted the resignation of a minister who made some drunken lewd remarks at a wěiyá (year-end office party), Ma was joking to the media about blow jobs.


    screenshot from a video of a news story on this

    But it was all for a good cause, of course. You see, the Mandarin expression chuī lǎba, when not referring to the literal playing of a trumpet, is usually taken in Taiwan to refer to a blow job. But in China, Ma explained, chuī lǎba means the same thing as the idiom pāi mǎpì (pat/kiss the horse’s ass — i.e., flatter). And now that we have the handy-dandy Zhōnghuá Yǔwén Zhīshikù (Chinese Language Database), which Ma was announcing, we can look up how Mandarin differs in Taiwan and China, and thus not get tripped up by such misunderstandings. Or at least that’s supposed to be the idea.

    The database, which is the result of cross-strait cooperation, can be accessed via two sites: one in Taiwan, the other in China.

    It’s clear that a lot of money has been spent on this. For example, many entries are accompanied by well-documented, precise explanations by distinguished lexicographers. Ha! Just kidding! Many entries are really accompanied by videos — some two hundred of them — of cutesy puppets gabbing about cross-strait differences in Mandarin expressions. But if there’s a video in there of the panda in the skirt explaining to the sheep in the vest that a useful skill for getting ahead in Chinese society is chuī lǎba, I haven’t found it yet. Will NMA will take up the challenge?

    Much of the site emphasizes not so much language as Chinese characters. For example, another expensively produced video feeds the ideographic myth by showing off obscure Hanzi, such as the one for chěng.

    WARNING: The screenshot below links to a video that contains scenes with intense wawa-ing and thus may not be suitable for anyone who thinks it’s not really cute for grown women to try to sound like they’re only thwee-and-a-half years old.


    In a welcome bit of synchronicity, Victor Mair posted on Language Log earlier the same week on the unpredictability of Chinese character formation and pronunciation, briefly discussing just such patterns of duplication, triplication, etc.

    Mair notes:

    Most of these characters are of relatively low frequency and, except for a few of them, neither their meanings nor their pronunciations are known by persons of average literacy.

    Many more such characters consisting or two, three, or four repetitions of the same character exist, and their sounds and meanings are in most cases equally or more opaque.

    The Hanzi for chěng (which looks like 馬馬馬 run together as one character) in the video above is sufficiently obscure that it likely won’t be shown correctly in many browsers on most systems when written in real text: 𩧢. But never fear: It’s already in Unicode and so should be appearing one of these years in a massively bloated system font.

    Further reinforcing the impression that the focus is on Chinese characters, Liú Zhàoxuán, who is the head of the association in charge of the project on the Taiwan side, equated traditional Chinese characters with Chinese culture itself and declared that getting the masses in China to recognize them is an important mission. (Liu really needs to read Lü Shuxiang’s “Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.”)

    Then he went on about how Chinese characters are a great system because, supposedly, they have a one-to-one correspondence with language that other scripts cannot match and people can know what they mean by looking at them (!) and that they therefore have a high degree of artistic quality (gāodù de yìshùxìng). Basically, the person in charge of this project seems to have a bad case of the Like Wow syndrome, which is not a reassuring trait for someone in charge of producing a dictionary.

    The same cooperation that built the Web sites led to a new book, Liǎng’àn Měirì Yī Cí (《兩岸每日一詞》 / Roughly: Cross-Strait Term-a-Day Book), which was also touted at the press conference.

    The book contains Hanyu Pinyin, as well as zhuyin fuhao. But, alas, the book makes the Pinyin look ugly and fails completely at the first rule of Pinyin: use word parsing. (In the online images from the book, such as the one below, all of the words are se pa ra ted in to syl la bles.)

    The Web site also has ugly Pinyin, with the CSS file for the Taiwan site calling for Pinyin to be shown in SimSun, which is one of the fonts it’s better not to use for Pinyin. But the word parsing on the Web site is at least not always wrong. Here are a few examples.

    • “跑神兒” is given as pǎoshénr (good).
    • And apostrophes appear to be used correctly: e.g., fàn’ān (販安), chūn’ān (春安), and fēi’ān (飛安).
    • But “第二春” is run together as “dìèrchūn” (no hyphen) rather than as shown correctly as dì-èr chūn.
    • And “一個頭兩個大” is given as yíɡe tóu liǎnɡɡe dà (for Taiwan) and yīɡe tóu liǎnɡɡe dà (for China). But ge is supposed to be written separately. (The variation of tone for yi is in this case useful.)

    Still, my general impression from this is that we should not expect the forthcoming cross-strait dictionary to be very good.

    Further reading:

    How to handle ‘de’ and interjections in Hanyu Pinyin

    cover image for the bookToday’s selection from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????) deals with how to write Mandarin’s various de‘s, mood particles, and interjections.

    This reading is available in two versions:

    • simplified Chinese characters: ???? ????? (zhùcí, tàncí)
    • traditional Chinese characters: ???? ?????

    I’ve already written about the principles in previous posts. For example, see

    How to write numbers and measure words in Hanyu Pinyin

    cover image for the bookToday’s selection from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????) is about writing numbers and measure words.

    This reading is available in two versions:

    For more on this, see these posts and the PDFs linked to therein.

    How to write verbs in Hanyu Pinyin (Mandarin text)

    cover image for the book

    Here’s the first of several selected readings from Yin Binyong’s X?nhuá P?nxi? Cídi?n (???????? / ????????). It covers the writing of verbs.

    This reading is available in two versions:

    • simplified Chinese characters: ???? ??
    • traditional Chinese characters: ???? ??

    For those who would like to read about this in English, see

    important book on Pinyin to be excerpted on this site

    cover image for the bookX?nhuá P?nxi? Cídi?n (???????? / ????????), is the second of Yin Binyong’s two books on Pinyin orthography. The first, Chinese Romanization: Pronunciation and Orthography, is in English and Mandarin; much of it is already available here on Pinyin.Info.

    Although Xinhua Pinxie Cidian is only in Mandarin, the large number of examples makes it easy to get the point even if you may not read Mandarin in Chinese characters very well.

    This week I will begin posting some excerpts from this invaluable work. What’s more, I have made a version in traditional Chinese characters, which I hope that readers in Taiwan, Hong Kong, and elsewhere will take advantage of. So those not used to reading simplified Chinese characters will have a choice (which is more than the government of Taiwan is providing these days).

    I’m extremely happy to be able to bring you this information and wish to acknowledge the generosity of the Commercial Press. Stay tuned.

    Oxford Chinese Dictionary goes online

    cover image of the Oxford Chinese DictionaryOxford University Press has just announced that its massive Oxford Chinese Dictionary is now available through its Oxford Language Dictionaries Online subscription service.

    I haven’t seen the online version yet myself; but from the publisher’s description it appears to be largely the same as the published edition, whose paucity of Pinyin is disappointing. The publisher, however, is promising that “Pinyin will be added to all Chinese translations” in November, which should be a major step forward.

    Perhaps some of you at universities have institutional access. I would welcome reports.

    source: What’s New, Oxford Language Dictionaries Online, May 2011.