Aiyo! OED fails to use Pinyin for some new entries

The Oxford English Dictionary has just added some new entries, including several from Sinitic languages.

A lot of these come by way of Singapore and so reflect the Hokkien language. For example, among the new entries is “ang pow,” which is Hokkien’s equivalent of Mandarin’s “hongbao,” which also made the list.

A few of the entries, however, come from Mandarin, for example two common interjections for surprise. Oddly, though, the OED uses “aiyoh” and “aiyah” instead of their proper Pinyin spellings of “aiyo” and “aiya.”

“Ah,” you say, “but maybe the aiyoh and aiyah spellings are more common in English.”


Even in Singapore domains (.sg), the Pinyin spellings are more common than those the OED calls for. As the tables below show, in every instance the Pinyin spellings are also more common in Hong Kong, China, and Taiwan. Throughout the world, the Pinyin spellings are more common — the vast majority of the time by a factor of at least two.

Google search results for “aiyo” (Pinyin) and “aiyoh” (spelling used in the OED)

  aiyo aiyoh
.sg 12,200 5,680
.hk 2,570 187
.cn 6,040 984
.tw 4,690 196
all domains 1,250,000 137,000
all domains  + “chinese” 97,700 77,100
all domains  + “mandarin” 51,800 14,100

Google search results for “aiya” (Pinyin) and “aiyah” (spelling used in the OED)

  aiya aiyah
.sg 17,600 8,310
.hk 6,400 2,360
.cn 13,200 1,860
.tw 5,910 1,710
all domains 3,370,000 332,000
all domains  + “chinese” 238,000 63,200
all domains  + “mandarin” 36,500 22,800

Searching Google Books also reveals that the Pinyin forms are more common.

In short, I do not see any good reason for the OED to have adopted ad hoc spellings rather than the Pinyin standard. They must have their reasons, but it looks like they botched this.

Biscriptal butt texting

Now there’s a headline you don’t see every day.

I’ve had mobile phones for years but never butt-dialed or butt-texted anyone … until a couple of months ago, when I seemed to make up for lost time by sending off a series of messages and Line calls to one of my wife’s relatives. To make matters worse, this relative is in the States, where it was then after midnight.

Anyway, the messages start off in nonsense English and then switch mainly to nonsense Mandarin.

Most of the Chinese characters are isolated and have no semantic relationship to those around them. Predictably, most of the characters are for few simple sounds

  • 凹 [āo] — concave
  • 鞥 [ēng] — quite rare: leading rein (of a horse)

But there are a few instances of at least two characters working together:

  • 偶爾 ǒu’ěr (“occasionally”)
  • 怨偶 yuàn’ǒu (“unhappy couple”)
  • 鱷魚 èyú (“crocodile”)
  • So just in case anyone has ever wondered what butt texting in Chinese characters looks like, here you go. People whose phones have different methods for inputting Chinese characters will likely see somewhat different results.

    composite screenshot of a series of text messages sent in garbage English and garbage Mandarin Chinese (in Chinese characters)

I took several screenshots and stitched them together in Photoshop.

Shanghai considers deleting Pinyin from street signs

The Shanghai Road Administration Bureau is considering removing Hanyu Pinyin from street signs in the city.

Typically, the bureau’s division chief, Wang Weifeng, seems to be confused about the difference between Pinyin and English. He also justifies the move by claiming that larger Chinese characters would benefit Chinese citizens, ignoring the high number of people in China who are largely illiterate.

“Of course we will keep the English-Chinese traffic signs around some special areas, such as the tourism spots, CBD areas and some transport hubs,” Wang said.

A German newspaper article notes:

Ob sie die Umschrift wortwörtlich „aus dem Verkehr“ zieht, will Schanghai angeblich von einer „Umfrage“ unter „Anwohnern“ abhängig machen, ebenso vom Urteil nicht näher genannter „Experten“. Dies ist eine gängige Formulierung, wenn chinesische Regierungsstellen ihren einsamen Entscheidungen einen basisdemokratischen Anstrich geben wollen.

[Google Translate: Whether they literally “out of circulation” pulls the inscription, Shanghai will supposedly make a “survey” of “residents” depends, as of indeterminate sentence from “experts”. This is a common formulation, when Chinese authorities want to give their lonely decisions a grassroots paint.]

This is a situation all too common in Taiwan as well, such as in Taipei’s misguided move to apply nicknumbering to subway stops. “Experts” — ha!

Shanghai’s survey on Pinyin use and signage is of course in Mandarin only, with no English. The poll ends on August 30 (next week!), so add your views to that soon.

So far, public opinion seems to be largely against removing Hanyu Pinyin from signs. But that doesn’t mean this might not happen anyway. After all: Shanghai has its “experts” on the case. Heh.

If Shanghai really wanted to help the legibility of its signs, it should consider using word parsing even with text in Chinese characters. For example:

  • use 陕西 南路, not 陕西南路
  • use 斜土 路, not 斜土路
  • use 建国 西路, not 建国西路

That would also permit the use of superscript on the generic parts of names (e.g., “南路”) to save space. This could also be done with the Pinyin/English, with the Pinyin in large letters and the English “Rd” etc. in superscript.

Thanks to Michael Cannings for the tip.


Languages, scripts, and signs: a walk around Taipei’s Shixin University

Recently I took some trails through the mountains in Taipei and ended up at Shih Hsin University (Shìxīn Dàxué / 世新大學). Near the school are some interesting signs. Rather than giving individual posts for each of these, I’m keeping the signs together in this one, as this is better testimony to the increasing and often playful diversity of languages and scripts in Taiwan.

Cǎo Chuàn

Here’s a restaurant whose name is given in Pinyin with tone marks! That’s quite a rarity here, though I suspect we’ll be seeing more of this in the future. The name in Chinese characters (草串) can be found, much smaller, on a separate sign below.



Right by Cao Chuan is Èrgē de Niúròumiàn (Second Brother’s Beef Noodle Soup). Note the use of the Japanese の rather than Mandarin’s 的; this is quite common in Taiwan.



This store has an ㄟ, which serves as a marker of the Taiwanese language. Here, ㄟ is the equivalent of 的 — and of の.

Bālè ei diàn

A’Woo Tea Bar


I couldn’t find a name in Chinese characters for this place. The name is probably onomatopoeia, as in “Werewolves of London — awoo!”

Taiwan presidential campaign logos

I’m far behind on writing about Taiwan’s upcoming election. The logos for the two main candidates in the presidential race were revealed about a month ago.

First up is the presidential campaign logo for Tsai Ing-wen (蔡英文 / Cài Yīngwén): “LIGHT UP TAIWAN 點亮台灣” (Diǎn liàng Táiwān).


And here is the campaign logo for the Kuomintang’s presidential candidate, Hung Hsiu-chu (Hóng Xiùzhù / 洪秀柱), er, Eric Chu (朱立倫 / Zhū Lìlún): “ONE TAIWAN 台灣就是力量” (Táiwān jiùshì lìliang).


It’s hard not to be struck by the fact that both prominently feature English slogans even though Taiwan has a distinct shortage of English-speaking Westerners who are eligible to vote here. (And, anyway, most such immigrants can read the Chinese characters.) For that matter, in both logos the English slogan comes first. That’s how cool and modern English is seen to be in Taiwan, even though it’s not an official language here. Coincidentally, one of the candidates is even named “Ing-wen” (“English language” / Yīngwén / 英文).

Sure, it’s window dressing; but it’s still window dressing in English.

In 2012 both major candidates had English slogans. Ma Ying-jeou used “Taiwan bravo;” and Tsai Ing-wen used “Taiwan next,” though Ma didn’t make such prominent use of English then as Chu is doing this year. My impression is that the Democratic Progressive Party embraced English much earlier than the Kuomintang but the KMT has since caught up with the DPP in this.

And, as was the case in the previous election, I’d like to note that both candidates used “台灣” rather than “臺灣” for “Taiwan,” despite the Ma administration’s declaration that the latter is the proper form.

further reading: Platform on tai?, Pinyin News, December 30, 2011

Milk Shop

Here’s another in my series of photos of English with Chinese character(istic)s, that is Chinese characters being used to write English (sort of). I want to stress that these aren’t loan words, just an approximate phonetic rendering of the English.

Today’s entry — which was taken a few weeks ago in Xinzhu (usually spelled “Hsinchu”), Taiwan — is Mi2ke4 Xia4 (lit. “lost guest summer”).

sign for a drinks store, labeled 'milk shop' in English and 'mi ke xia' in Chinese characters


I tend to think of Hanzi being used to write English words as “Singlish,” after John DeFrancis’s classic spoof, “The Singlish Affair,” which is the opening chapter of his essential book The Chinese Language: Fact and Fantasy. But these days the word is mainly used for Singaporean English. So now I usually go with something like “English with Chinese character(istic)s.”

For a few earlier examples, see the my photos of the dog and the butterfly businesses.

Today’s example is “Crunchy,” written as ke3 lang3 qi2 (can bright strange). Kelangqi, however, isn’t how to say “crunchy” in Mandarin (cui4 de is); it’s just an attempt to render the English word using Chinese characters, probably in an attempt to look different and cool.

Sign advertising a store named 'Crunchy' in English and 'ke lang qi' (in Chinese characters) in Mandarin

Crunchy, which is now out of business, was just a block away from the Dog (dou4 ge2) store, which is still around.

New database of cross-strait differences in Mandarin goes online

Last week, on the same day President Ma Ying-jeou accepted the resignation of a minister who made some drunken lewd remarks at a wěiyá (year-end office party), Ma was joking to the media about blow jobs.


screenshot from a video of a news story on this

But it was all for a good cause, of course. You see, the Mandarin expression chuī lǎba, when not referring to the literal playing of a trumpet, is usually taken in Taiwan to refer to a blow job. But in China, Ma explained, chuī lǎba means the same thing as the idiom pāi mǎpì (pat/kiss the horse’s ass — i.e., flatter). And now that we have the handy-dandy Zhōnghuá Yǔwén Zhīshikù (Chinese Language Database), which Ma was announcing, we can look up how Mandarin differs in Taiwan and China, and thus not get tripped up by such misunderstandings. Or at least that’s supposed to be the idea.

The database, which is the result of cross-strait cooperation, can be accessed via two sites: one in Taiwan, the other in China.

It’s clear that a lot of money has been spent on this. For example, many entries are accompanied by well-documented, precise explanations by distinguished lexicographers. Ha! Just kidding! Many entries are really accompanied by videos — some two hundred of them — of cutesy puppets gabbing about cross-strait differences in Mandarin expressions. But if there’s a video in there of the panda in the skirt explaining to the sheep in the vest that a useful skill for getting ahead in Chinese society is chuī lǎba, I haven’t found it yet. Will NMA will take up the challenge?

Much of the site emphasizes not so much language as Chinese characters. For example, another expensively produced video feeds the ideographic myth by showing off obscure Hanzi, such as the one for chěng.

WARNING: The screenshot below links to a video that contains scenes with intense wawa-ing and thus may not be suitable for anyone who thinks it’s not really cute for grown women to try to sound like they’re only thwee-and-a-half years old.


In a welcome bit of synchronicity, Victor Mair posted on Language Log earlier the same week on the unpredictability of Chinese character formation and pronunciation, briefly discussing just such patterns of duplication, triplication, etc.

Mair notes:

Most of these characters are of relatively low frequency and, except for a few of them, neither their meanings nor their pronunciations are known by persons of average literacy.

Many more such characters consisting or two, three, or four repetitions of the same character exist, and their sounds and meanings are in most cases equally or more opaque.

The Hanzi for chěng (which looks like 馬馬馬 run together as one character) in the video above is sufficiently obscure that it likely won’t be shown correctly in many browsers on most systems when written in real text: 𩧢. But never fear: It’s already in Unicode and so should be appearing one of these years in a massively bloated system font.

Further reinforcing the impression that the focus is on Chinese characters, Liú Zhàoxuán, who is the head of the association in charge of the project on the Taiwan side, equated traditional Chinese characters with Chinese culture itself and declared that getting the masses in China to recognize them is an important mission. (Liu really needs to read Lü Shuxiang’s “Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.”)

Then he went on about how Chinese characters are a great system because, supposedly, they have a one-to-one correspondence with language that other scripts cannot match and people can know what they mean by looking at them (!) and that they therefore have a high degree of artistic quality (gāodù de yìshùxìng). Basically, the person in charge of this project seems to have a bad case of the Like Wow syndrome, which is not a reassuring trait for someone in charge of producing a dictionary.

The same cooperation that built the Web sites led to a new book, Liǎng’àn Měirì Yī Cí (《兩岸每日一詞》 / Roughly: Cross-Strait Term-a-Day Book), which was also touted at the press conference.

The book contains Hanyu Pinyin, as well as zhuyin fuhao. But, alas, the book makes the Pinyin look ugly and fails completely at the first rule of Pinyin: use word parsing. (In the online images from the book, such as the one below, all of the words are se pa ra ted in to syl la bles.)

The Web site also has ugly Pinyin, with the CSS file for the Taiwan site calling for Pinyin to be shown in SimSun, which is one of the fonts it’s better not to use for Pinyin. But the word parsing on the Web site is at least not always wrong. Here are a few examples.

  • “跑神兒” is given as pǎoshénr (good).
  • And apostrophes appear to be used correctly: e.g., fàn’ān (販安), chūn’ān (春安), and fēi’ān (飛安).
  • But “第二春” is run together as “dìèrchūn” (no hyphen) rather than as shown correctly as dì-èr chūn.
  • And “一個頭兩個大” is given as yíɡe tóu liǎnɡɡe dà (for Taiwan) and yīɡe tóu liǎnɡɡe dà (for China). But ge is supposed to be written separately. (The variation of tone for yi is in this case useful.)

Still, my general impression from this is that we should not expect the forthcoming cross-strait dictionary to be very good.

Further reading: