Mandarin words with more than one apostrophe

qīng’ěr’értīng
傾耳而聽
listen attentively

As I often note, apostrophes are used in only about 2 percent of words as written in Hanyu Pinyin. But when they’re needed, they’re needed. Don’t skip them.

A few years back, someone wrote to me to ask about multiple apostrophes in Pinyin. I dug through a 2019 edition of the CC-CEDICT (2019-11-12 04:41:56 GMT) for an answer. But I don’t think I ever posted my findings online. It’s time to rectify that.

CC-CEDICT is not an ideal source in terms of words, because some entries are phrases rather than single words, though they are not marked separately than words, which means that some entries might be better off with spaces rather than apostrophes, which would reduce the apostrophe count and percentage.

So, with that in mind, of the file’s 117,579 entries, 3,006 needed apostrophes, or 2.56 percent.

No entry needed three or more apostrophes.

Only 52 entries needed two apostrophes, or 0.04% of the total (1 per 2,261 entries).

Most of those were just Mandarinized foreign proper nouns. For example:

  • Ā’ěrjí’ěr: Algiers, capital of Algeria/ 阿爾及爾 阿尔及尔
  • Āi’ěrduō’ān: Erdogan (name)/Recep Tayyip Erdoğan (1954-), Turkish politician, prime minister from 2003/ 埃爾多安 埃尔多安
  • Běi’ài’ěrlán: Northern Ireland/ 北愛爾蘭 北爱尔兰
  • Bì’ěrbā’è: Bilbao (city in Spain)/ 畢爾巴鄂 毕尔巴鄂
  • Dá’ěrfú’ěr: Darfur (western province of Sudan)/ 達爾福爾 达尔福尔
  • Dá’ěrfù’ěr: Darfur, region of west Sudan/ 達爾富爾 达尔富尔
  • fēi’ābèi’ěr: (math.) non-abelian/ 非阿貝爾 非阿贝尔
  • Fèi’àoduō’ěr: Theodor of Fyodor (name)/ 費奧多爾 费奥多尔
  • gǔ’ānxiān’àn: glutamine (Gln), an amino acid/ 谷氨酰胺 谷氨酰胺
  • Láiwàng’è’ěr: Levanger (city in Trøndelag, Norway)/ 萊旺厄爾 莱旺厄尔
  • Léi’ā’ěrchéng: Ciudad Real/ 雷阿爾城 雷阿尔城
  • Luójié’ài’ěrzhī: Raziel, archangel in Judaism/ 羅潔愛爾之 罗洁爱尔之
  • Mài’ěrwéi’ěr: Melville (name)/Herman Melville (1819-1891), US novelist, author of Moby Dick / 麥爾維爾 麦尔维尔
  • Pí’āi’ěr: Pierre (name)/ 皮埃爾 皮埃尔
  • Shàng’àisè’ěr: Overijssel/ 上艾瑟爾 上艾瑟尔
  • Sīfú’ěrwǎ’ěr: Svolvær (city in Nordland, Norway)/ 斯福爾瓦爾 斯福尔瓦尔
  • Sītài’ēnxiè’ěr: Steinkjær (city in Trøndelag, Norway)/ 斯泰恩謝爾 斯泰恩谢尔
  • Tèlǔ’āi’ěr: Tergüel or Teruel, Spain/ 特魯埃爾 特鲁埃尔
  • Xīn’ào’ěrliáng: New Orleans, Louisiana/ 新奧爾良 新奥尔良

Examples of more regular Mandarin entries with two apostrophes include:

  • bái’éyàn’ōu: (bird species of China) little tern (Sternula albifrons)/ 白額燕鷗 白额燕鸥
  • báixuě’ái’ái: brilliant white snow cover (esp. of distant peaks)/ 白雪皚皚 白雪皑皑
  • chū’ěrfǎn’ěr: old: to reap the consequences of one’s words (idiom, from Mencius); modern: to go back on one’s word/to blow hot and cold/to contradict oneself/inconsistent/ 出爾反爾 出尔反尔
  • húnhún’è’è: muddleheaded/ 渾渾噩噩 浑浑噩噩
  • pāi’àn’érqǐ: lit. to slap the table and stand up (idiom); fig. at the end of one’s tether/unable to take it any more/ 拍案而起 拍案而起
  • qì’áng’áng: full of vigor/spirited/valiant/ 氣昂昂 气昂昂
  • qīng’ěr’értīng: to listen attentively/ 傾耳而聽 倾耳而听
  • qīqī’ài’ài: stammering (idiom)/ 期期艾艾 期期艾艾
  • suíyù’ér’ān: at home wherever one is (idiom); ready to adapt/flexible/to accept circumstances with good will/ 隨遇而安 随遇而安
  • xiù’ēn’ài: to make a public display of affection/ 秀恩愛 秀恩爱
  • yǐ’échuán’é: to spread falsehoods/to increasingly distort the truth/to pile errors on top of errors (idiom)/ 以訛傳訛 以讹传讹

A few of those present interesting questions in orthography. For example, Xīn’ào’ěrliáng or Xīn Ào’ěrliáng?

But, basically, those entries are outliers. Relatively few words in Pinyin need an apostrophe; only a minute subset of those need two apostrophes; and, to my knowledge, none need three or more apostrophes.

Can you think of any triple-apostrophe words? Sorry, written examples of stuttering don’t count.

Large Mongolian-Korean dictionary released

Photo of the Mongolian-Korean dictionary

Dankook University’s Mongolian Research Institute has released what is being called the world’s largest Mongolian dictionary (actually a Mongolian–Korean dictionary), the 몽한대사전.

The two-volume work, which was more than ten years in the making, has some 85,000 headwords and more than 3,000 pages.

Source:
단국대, 15년만에 세계 최대 몽골어 사전 ‘몽한대사전’ 편찬, Donga Ilbo, April 5, 2023.

Dungan-English Dictionary published

Eastbridge Books, an imprint of Camphor Press, is pleased to announce the publication of its Dungan-English Dictionary, by Olli Salmi.

Dungan-English Dictionary sample page spread

Dungan is interesting for Chinese studies because it has an alphabetic orthography. It is also important because it shows very little influence from the Chinese literary language. It has preserved original features of the local dialects of about 150 years ago. It also has loans from Persian and Arabic, from Turkic languages, and from Russian.

The Dungans are Muslims who fled China for Russian territory in Central Asia after the failure of the Dungan Revolt (1862-1877). Their language, which UNESCO classifies as “definitely endangered,” is related to northwestern Mandarin Chinese. Dungan has two main dialects: the so-called Gansu dialect, which is similar to the Muslim Chinese communal dialects in the southern part of the province of Xinjiang, and the Shaanxi dialect, which has more in common with the dialects of southern Shaanxi around Xi’an. In the Soviet Union an alphabetic orthography and a literary language was developed for the Gansu dialect.

Although Dungan is now spoken primarily outside of China and employs an alphabet rather than Chinese characters, it is not really a peripheral dialect of Chinese. The Dungan Revolt started near Xi’an, Shaanxi, the cradle of the Chinese civilization and a frequent site of the capital of the country. (This is where the terracotta soldiers were buried.) The speakers that gave rise to Gansu Dungan came from a place west of the Shaanxi speakers, but still a totally Chinese-speaking area.

This dictionary is based on words and examples collected from Dungan-language newspapers and books published before the fall of the Soviet Union. Special attention has been paid to not only vocabulary (9,945 headwords) but also grammatical features; the dictionary may even provide material for the study of syntax. An effort has been made to find characters for Dungan words in dialect dictionaries published in China.

This work is available through Camphor Press and Amazon.

Note: I am part of Camphor Press and so stand to make a small amount of money from sales of this book. But that’s not why I’m recommending it to everyone interested in Dungan.

Aiyo! OED fails to use Pinyin for some new entries

The Oxford English Dictionary has just added some new entries, including several from Sinitic languages.

A lot of these come by way of Singapore and so reflect the Hokkien language. For example, among the new entries is “ang pow,” which is Hokkien’s equivalent of Mandarin’s “hongbao,” which also made the list.

A few of the entries, however, come from Mandarin, for example two common interjections for surprise. Oddly, though, the OED uses “aiyoh” and “aiyah” instead of their proper Pinyin spellings of “aiyo” and “aiya.”

“Ah,” you say, “but maybe the aiyoh and aiyah spellings are more common in English.”

Nope.

Even in Singapore domains (.sg), the Pinyin spellings are more common than those the OED calls for. As the tables below show, in every instance the Pinyin spellings are also more common in Hong Kong, China, and Taiwan. Throughout the world, the Pinyin spellings are more common — the vast majority of the time by a factor of at least two.

Google search results for “aiyo” (Pinyin) and “aiyoh” (spelling used in the OED)

  aiyo aiyoh
.sg 12,200 5,680
.hk 2,570 187
.cn 6,040 984
.tw 4,690 196
all domains 1,250,000 137,000
all domains  + “chinese” 97,700 77,100
all domains  + “mandarin” 51,800 14,100

Google search results for “aiya” (Pinyin) and “aiyah” (spelling used in the OED)

  aiya aiyah
.sg 17,600 8,310
.hk 6,400 2,360
.cn 13,200 1,860
.tw 5,910 1,710
all domains 3,370,000 332,000
all domains  + “chinese” 238,000 63,200
all domains  + “mandarin” 36,500 22,800

Searching Google Books also reveals that the Pinyin forms are more common.

In short, I do not see any good reason for the OED to have adopted ad hoc spellings rather than the Pinyin standard. They must have their reasons, but it looks like they botched this.

Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’ān sorts before pīnyīn, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  • shīshi
  • shīshī
  • shīshí
  • shīshǐ
  • shīshì
  • shíshī
  • shíshì
  • shǐshī
  • shìshī
  • Irrespective of tones, entries with the vowel u precede those with ü.
    For example:

    Entries without apostrophe precede those with apostrophe. For example:

    1. biànargue
    2. bǐ’ànthe other shore

    Lower-case entries precede upper-case entries. For example:

    1. hòujìnaftereffect
    2. Hòu JìnLater Jin dynasty

    For entries with identical spelling, including tones, arrangement is by order of frequency….

    For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is notā á ǎ à a,” but “a ā á ǎ à.” And, because lowercase comes before uppercase, notA a Ā ā Á á Ǎ ǎ À à” but “a A ā Ā á Á ǎ Ǎ à À.”

    One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

    The ABC series follows the example of the Hanyu Pinyin Cihui (汉语拼音词汇 / 漢語拼音詞彙 / Hànyǔ Pīnyīn Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

    HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìgōng sorted before lǐ-gōng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting lǐ-gōng before lìgōng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, ¹míng-àn 明暗 and ²míng’àn 冥暗 are treated as homophones, and they sort after mǐngǎn 敏感.

    New database of cross-strait differences in Mandarin goes online

    Last week, on the same day President Ma Ying-jeou accepted the resignation of a minister who made some drunken lewd remarks at a wěiyá (year-end office party), Ma was joking to the media about blow jobs.

    Classy.

    screenshot from a video of a news story on this

    But it was all for a good cause, of course. You see, the Mandarin expression chuī lǎba, when not referring to the literal playing of a trumpet, is usually taken in Taiwan to refer to a blow job. But in China, Ma explained, chuī lǎba means the same thing as the idiom pāi mǎpì (pat/kiss the horse’s ass — i.e., flatter). And now that we have the handy-dandy Zhōnghuá Yǔwén Zhīshikù (Chinese Language Database), which Ma was announcing, we can look up how Mandarin differs in Taiwan and China, and thus not get tripped up by such misunderstandings. Or at least that’s supposed to be the idea.

    The database, which is the result of cross-strait cooperation, can be accessed via two sites: one in Taiwan, the other in China.

    It’s clear that a lot of money has been spent on this. For example, many entries are accompanied by well-documented, precise explanations by distinguished lexicographers. Ha! Just kidding! Many entries are really accompanied by videos — some two hundred of them — of cutesy puppets gabbing about cross-strait differences in Mandarin expressions. But if there’s a video in there of the panda in the skirt explaining to the sheep in the vest that a useful skill for getting ahead in Chinese society is chuī lǎba, I haven’t found it yet. Will NMA will take up the challenge?

    Much of the site emphasizes not so much language as Chinese characters. For example, another expensively produced video feeds the ideographic myth by showing off obscure Hanzi, such as the one for chěng.

    WARNING: The screenshot below links to a video that contains scenes with intense wawa-ing and thus may not be suitable for anyone who thinks it’s not really cute for grown women to try to sound like they’re only thwee-and-a-half years old.

    cheng3

    In a welcome bit of synchronicity, Victor Mair posted on Language Log earlier the same week on the unpredictability of Chinese character formation and pronunciation, briefly discussing just such patterns of duplication, triplication, etc.

    Mair notes:

    Most of these characters are of relatively low frequency and, except for a few of them, neither their meanings nor their pronunciations are known by persons of average literacy.

    Many more such characters consisting or two, three, or four repetitions of the same character exist, and their sounds and meanings are in most cases equally or more opaque.

    The Hanzi for chěng (which looks like 馬馬馬 run together as one character) in the video above is sufficiently obscure that it likely won’t be shown correctly in many browsers on most systems when written in real text: 𩧢. But never fear: It’s already in Unicode and so should be appearing one of these years in a massively bloated system font.

    Further reinforcing the impression that the focus is on Chinese characters, Liú Zhàoxuán, who is the head of the association in charge of the project on the Taiwan side, equated traditional Chinese characters with Chinese culture itself and declared that getting the masses in China to recognize them is an important mission. (Liu really needs to read Lü Shuxiang’s “Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.”)

    Then he went on about how Chinese characters are a great system because, supposedly, they have a one-to-one correspondence with language that other scripts cannot match and people can know what they mean by looking at them (!) and that they therefore have a high degree of artistic quality (gāodù de yìshùxìng). Basically, the person in charge of this project seems to have a bad case of the Like Wow syndrome, which is not a reassuring trait for someone in charge of producing a dictionary.

    The same cooperation that built the Web sites led to a new book, Liǎng’àn Měirì Yī Cí (《兩岸每日一詞》 / Roughly: Cross-Strait Term-a-Day Book), which was also touted at the press conference.

    The book contains Hanyu Pinyin, as well as zhuyin fuhao. But, alas, the book makes the Pinyin look ugly and fails completely at the first rule of Pinyin: use word parsing. (In the online images from the book, such as the one below, all of the words are se pa ra ted in to syl la bles.)

    The Web site also has ugly Pinyin, with the CSS file for the Taiwan site calling for Pinyin to be shown in SimSun, which is one of the fonts it’s better not to use for Pinyin. But the word parsing on the Web site is at least not always wrong. Here are a few examples.

    • “跑神兒” is given as pǎoshénr (good).
    • And apostrophes appear to be used correctly: e.g., fàn’ān (販安), chūn’ān (春安), and fēi’ān (飛安).
    • But “第二春” is run together as “dìèrchūn” (no hyphen) rather than as shown correctly as dì-èr chūn.
    • And “一個頭兩個大” is given as yíɡe tóu liǎnɡɡe dà (for Taiwan) and yīɡe tóu liǎnɡɡe dà (for China). But ge is supposed to be written separately. (The variation of tone for yi is in this case useful.)

    Still, my general impression from this is that we should not expect the forthcoming cross-strait dictionary to be very good.

    Further reading:

    How to handle ‘de’ and interjections in Hanyu Pinyin

    cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) deals with how to write Mandarin’s various de‘s, mood particles, and interjections.

    This reading is available in two versions:

    I’ve already written about the principles in previous posts. For example, see

    How to write numbers and measure words in Hanyu Pinyin

    cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) is about writing numbers and measure words.

    This reading is available in two versions:

    For more on this, see these posts and the PDFs linked to therein.