Mandarin words with more than one apostrophe

qīng’ěr’értīng
傾耳而聽
listen attentively

As I often note, apostrophes are used in only about 2 percent of words as written in Hanyu Pinyin. But when they’re needed, they’re needed. Don’t skip them.

A few years back, someone wrote to me to ask about multiple apostrophes in Pinyin. I dug through a 2019 edition of the CC-CEDICT (2019-11-12 04:41:56 GMT) for an answer. But I don’t think I ever posted my findings online. It’s time to rectify that.

CC-CEDICT is not an ideal source in terms of words, because some entries are phrases rather than single words, though they are not marked separately than words, which means that some entries might be better off with spaces rather than apostrophes, which would reduce the apostrophe count and percentage.

So, with that in mind, of the file’s 117,579 entries, 3,006 needed apostrophes, or 2.56 percent.

No entry needed three or more apostrophes.

Only 52 entries needed two apostrophes, or 0.04% of the total (1 per 2,261 entries).

Most of those were just Mandarinized foreign proper nouns. For example:

  • Ā’ěrjí’ěr: Algiers, capital of Algeria/ 阿爾及爾 阿尔及尔
  • Āi’ěrduō’ān: Erdogan (name)/Recep Tayyip Erdoğan (1954-), Turkish politician, prime minister from 2003/ 埃爾多安 埃尔多安
  • Běi’ài’ěrlán: Northern Ireland/ 北愛爾蘭 北爱尔兰
  • Bì’ěrbā’è: Bilbao (city in Spain)/ 畢爾巴鄂 毕尔巴鄂
  • Dá’ěrfú’ěr: Darfur (western province of Sudan)/ 達爾福爾 达尔福尔
  • Dá’ěrfù’ěr: Darfur, region of west Sudan/ 達爾富爾 达尔富尔
  • fēi’ābèi’ěr: (math.) non-abelian/ 非阿貝爾 非阿贝尔
  • Fèi’àoduō’ěr: Theodor of Fyodor (name)/ 費奧多爾 费奥多尔
  • gǔ’ānxiān’àn: glutamine (Gln), an amino acid/ 谷氨酰胺 谷氨酰胺
  • Láiwàng’è’ěr: Levanger (city in Trøndelag, Norway)/ 萊旺厄爾 莱旺厄尔
  • Léi’ā’ěrchéng: Ciudad Real/ 雷阿爾城 雷阿尔城
  • Luójié’ài’ěrzhī: Raziel, archangel in Judaism/ 羅潔愛爾之 罗洁爱尔之
  • Mài’ěrwéi’ěr: Melville (name)/Herman Melville (1819-1891), US novelist, author of Moby Dick / 麥爾維爾 麦尔维尔
  • Pí’āi’ěr: Pierre (name)/ 皮埃爾 皮埃尔
  • Shàng’àisè’ěr: Overijssel/ 上艾瑟爾 上艾瑟尔
  • Sīfú’ěrwǎ’ěr: Svolvær (city in Nordland, Norway)/ 斯福爾瓦爾 斯福尔瓦尔
  • Sītài’ēnxiè’ěr: Steinkjær (city in Trøndelag, Norway)/ 斯泰恩謝爾 斯泰恩谢尔
  • Tèlǔ’āi’ěr: Tergüel or Teruel, Spain/ 特魯埃爾 特鲁埃尔
  • Xīn’ào’ěrliáng: New Orleans, Louisiana/ 新奧爾良 新奥尔良

Examples of more regular Mandarin entries with two apostrophes include:

  • bái’éyàn’ōu: (bird species of China) little tern (Sternula albifrons)/ 白額燕鷗 白额燕鸥
  • báixuě’ái’ái: brilliant white snow cover (esp. of distant peaks)/ 白雪皚皚 白雪皑皑
  • chū’ěrfǎn’ěr: old: to reap the consequences of one’s words (idiom, from Mencius); modern: to go back on one’s word/to blow hot and cold/to contradict oneself/inconsistent/ 出爾反爾 出尔反尔
  • húnhún’è’è: muddleheaded/ 渾渾噩噩 浑浑噩噩
  • pāi’àn’érqǐ: lit. to slap the table and stand up (idiom); fig. at the end of one’s tether/unable to take it any more/ 拍案而起 拍案而起
  • qì’áng’áng: full of vigor/spirited/valiant/ 氣昂昂 气昂昂
  • qīng’ěr’értīng: to listen attentively/ 傾耳而聽 倾耳而听
  • qīqī’ài’ài: stammering (idiom)/ 期期艾艾 期期艾艾
  • suíyù’ér’ān: at home wherever one is (idiom); ready to adapt/flexible/to accept circumstances with good will/ 隨遇而安 随遇而安
  • xiù’ēn’ài: to make a public display of affection/ 秀恩愛 秀恩爱
  • yǐ’échuán’é: to spread falsehoods/to increasingly distort the truth/to pile errors on top of errors (idiom)/ 以訛傳訛 以讹传讹

A few of those present interesting questions in orthography. For example, Xīn’ào’ěrliáng or Xīn Ào’ěrliáng?

But, basically, those entries are outliers. Relatively few words in Pinyin need an apostrophe; only a minute subset of those need two apostrophes; and, to my knowledge, none need three or more apostrophes.

Can you think of any triple-apostrophe words? Sorry, written examples of stuttering don’t count.

ChatGPT writes poetry in Pinyin, sort of

I asked ChatGPT to “write a poem in Mandarin using Pinyin.”

Its first response took the form of a sonnet:

qīngchūn yǒu tài duō de mèngxiǎng,
xīwàng yǒu yīgè xiǎng zhù de dìfāng.
yīgè rén zài huáshān shàng,
tā de mèng yǒu tèbié de chàng.

zài fēngláng zhī xiàn,
tā yòng bàozhǐ zhànwàng huāliàn.
yǒngqì bǎ dōngfēng zài shǒu,
zài yùndòng zhōng fàng sīwáng zhī hòu.

mèngzhōng yǒu tài duō de jìhuà,
yǒnggǎn bǎ tāmen dōu chéngshí huà.
yǒngyuan bùyào bèi shìjiè suǒ bǎozhòng,
zhǐyào huílái bùwàng jiùshì yīzhǒng.

qīngchūn yǒu tài duō de mèngxiǎng,
wǒmen yīqǐ xiǎng zhù zhè yīqiè de dìfāng.

That one won’t win any prizes, but the result was still much better than the one produced when I asked ChatGPT to make another:

Sì jiāo qíng yuán, xìng fú suì suì,
Chūn fēng yè sè, ruò dàng huā kāi.
Dōu shì shuāng shǒu, shuāng shǒu yī,
Xīn lián xīn shì, shuāng xīn xī.

Qīng sè yǔ tiān, yǔ zhuǎn yǔ duō,
Cǎo yuǎn shān jǐn, dōu xīng huā tuō.
Yuè liàng guāng míng, míng yǔ ruò,
Qíng qíng yǔ hǎo, yǔ zhōng yǔ.

Shēn tián yǔ sǎng, qíng hǎo yǔ,
Liàng yǔ fēn fēn, shǎn chéng yǔ.
Qīng chuān huáng hé, yǔ yíng cháo,
Chūn sè xīn róu, xìng fú mǎo.

Wēn huā yǔ chūn, yǔ xǐ xīn,
Sān yuán sì jì, yǔ zhōng shuāng.
Chūn tiān xìng fú, zhuǎn yǔ shuāng,
Qíng huā yǔ jīng, yǔ yīng huáng.

That opaque mo no syl la bic mess is infected by the spirit of Literary Sinitic, which Pinyin isn’t designed to handle.

I’m hopeful that ChatGPT will get trained properly on the rules of Hanyu Pinyin — both the short-form rules of Pinyin and the longer (and superior) work of Yin Binyong — and so will be able to produce proper Pinyin. But maybe wǒ yǒu tài duō de mèngxiǎng.

As one of my heroes, Hu Shih, wrote one hundred years ago,

Nǐ bùnéng zuò wǒ de shī,
zhèngrú wǒ bùnéng zuò nǐ de mèng

(你不能做我的詩,
正如我不能做你的夢)

NB: I should probably remind everyone, should you wish to include Chinese characters or Pinyin with tone marks in a comment, be sure to encode them first or they’ll end up scrambled here. (Not my fault. Sorry.)

Microsoft Translator and Pinyin

screenshot of the text described in the post, as treated by Microsoft Translator

If supplied with the following,

談中國的“語”和“文”的問題,我覺得最好能先了解一下在中國通用的語言。中國的主要語言有哪些?為甚麼我說這個,而不說那個?因為環境?因為被強迫?因為我愛這個語言?因為有必要?因為這個語言很重要?也想想什麼是中國人的共同語言。用一個共同語言有必要嗎?為什麼?別的漢語的去向會怎麼樣?如果你使用中國的共同語言普通話,你了解這個語言的語法(比如“的, 得, 地“ 和“了” 的不同用法)嗎? 知道這個語言的基本音節(不包括聲調)只有408個嗎?

Microsoft Translator produces the following Hanyu Pinyin:

tán zhōngguóde “yǔ” hé “wén”dewèntí, wǒjuéde zuìhǎo néng xiānliǎojiě yì xiàzài zhōngguó tōngyòng de yǔyán。 zhōngguóde zhǔyào yǔyán yǒu nǎxiē? wèishénme wǒshuō zhège ,érbùshuōnàgè? yīnwéi huánjìng? yīnwéi bèi qiǎngpò? yīnwéi wǒài zhège yǔyán? yīnwéi yǒubìyào? yīnwéi zhège yǔyán hěnzhòngyào? yě xiǎngxiǎng shénmeshì zhōngguórén de gòngtóngyǔyán。 yòng yígè gòngtóngyǔyán yǒubìyào ma? wèishénme? biéde hànyǔ de qùxiàng huì zěnmeyàng? rúguǒnǐ shǐyòng zhōngguóde gòngtóngyǔyán pǔtōnghuà , nǐ liǎojiě zhège yǔyán de yǔfǎ ( bǐrú “de,dé, de ”hé“le” de bùtóng yòngfǎ )ma? zhīdào zhège yǔyán de jīběn yīnjié (bùbāokuòshēngtiáo) zhǐyǒu 408gèma?

This has a number of obvious problems:

  • failure to capitalize the first letter in a sentence
  • failure to capitalize proper nouns (e.g., “zhongguo” should be “Zhongguo”) (Here is how to handle proper nouns in Pinyin.)
  • frequent appending of “de” to the word before it (Here is how to handle de in Pinyin.)
  • incorrect punctuation, e.g., commas, periods, parentheses, and question marks were not converted from their double-width (i.e., Chinese character) forms to regular roman forms (“,。?()” should appear instead as “,.?()”)
  • incorrect word parsing (sometimes)

In short: Thumbs-down for now. But it might not take too much work for Microsoft to make this significantly better.

Taipei to spend NT$300 million making MRT signage worse

Taipei MRT station
Commonwealth Magazine (Tiānxià zázhì) recently interviewed me for a Mandarin-language piece related to the signage on Taipei’s MRT system.

As anyone who has looked at Pinyin News more than a couple of times over the years should be able to guess, I had a lot to say about that — most of which understandably didn’t make it into the article. For example, I recall making liberal use of the word “bèn” (“stupid”) to describe the situation and the city’s approach. But the reporter — Yen Pei-hua (Yán Pèihuá / 嚴珮華), who is perhaps Taiwan’s top business journalist — diplomatically omitted that.

Since the article discusses the nicknumbering system Taipei is determined to implement “for the foreigners,” even though most foreigners are at best indifferent to this, but doesn’t include my remarks on it, I’ll refer you to my post on this from last year: Taipei MRT moves to adopt nicknumbering system. Back then, though, I didn’t know the staggering amount of money the city is going to spend on screwing up the MRT system’s signs: NT$300 million (about US$10 million)! The main reason given for this is the sports event Taipei will host next summer. That’s supposed to last for about ten days, which would put the cost for the signs alone at about US$1 million per day.

On the other hand, the city does not plan to fix the real problems with the Taipei MRT’s station names, specifically the lack of apostrophes in what should be written Qili’an (not Qilian), Da’an (not Daan) (twice!), Jing’an (not Jingan), and Yong’an (not Yongan) — in Chinese characters: 唭哩岸, 大安, 景安, and 永安, respectively. And then there’s the problem of wordy English names.

Well, take a look and comment — here, or better still, on the Facebook page. (Links below.) I’m grateful to Ms. Yen and Commonwealth for discussing the issue.

References:

PRC’s official rules for Pinyin: 2012 revision — in traditional Chinese characters

Last week I put online China’s official rules for Hanyu Pinyin, the 2012 revision (GB/T 16159-2012). I’ve now made a traditional-Chinese-character version of those rules for Pinyin.

Eventually I’ll also issue versions in Pinyin and English.

gbt_16159-2012_traditional
(Note: The image above is of course Photoshopped. I altered the cover of the PRC standard simply to provide an illustration in traditional Chinese characters for this post.)

PRC’s official rules for Pinyin: 2012 revision

In 2012 China revised its official guidelines for writing Pinyin.

These are the Hanyu Pinyin Zhengcifa Jiben Guize (official translation: “Basic Rules of the Chinese Phonetic Alphabet Orthography”), promulgated as GB/T 16159-2012.

Among the changes are that some alternate forms are now allowed, for example “wo de” (my) may also be written as “wode”. I’m not thrilled about that; but I know some people will welcome this.

I’ve added a few notes, such as for errors in the original document.

So far I have made only a version in so-called simplified Chinese characters. But eventually I’ll add one in traditional Chinese characters and an English translation.

front cover of GB/T 16159-2012 Pinyin guidelines

Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’ān sorts before pīnyīn, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  • shīshi
  • shīshī
  • shīshí
  • shīshǐ
  • shīshì
  • shíshī
  • shíshì
  • shǐshī
  • shìshī
  • Irrespective of tones, entries with the vowel u precede those with ü.
    For example:

    Entries without apostrophe precede those with apostrophe. For example:

    1. biànargue
    2. bǐ’ànthe other shore

    Lower-case entries precede upper-case entries. For example:

    1. hòujìnaftereffect
    2. Hòu JìnLater Jin dynasty

    For entries with identical spelling, including tones, arrangement is by order of frequency….

    For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is notā á ǎ à a,” but “a ā á ǎ à.” And, because lowercase comes before uppercase, notA a Ā ā Á á Ǎ ǎ À à” but “a A ā Ā á Á ǎ Ǎ à À.”

    One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

    The ABC series follows the example of the Hanyu Pinyin Cihui (汉语拼音词汇 / 漢語拼音詞彙 / Hànyǔ Pīnyīn Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

    HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìgōng sorted before lǐ-gōng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting lǐ-gōng before lìgōng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, ¹míng-àn 明暗 and ²míng’àn 冥暗 are treated as homophones, and they sort after mǐngǎn 敏感.

    How to write adjectives in Hanyu Pinyin

    cover image for the bookToday’s selection from Yin Binyong’s Xīnhuá Pīnxiě Cídiǎn (《新华拼写词典》 / 《新華拼寫詞典》) deals with how to write Mandarin’s adjectives.

    This reading is available in two versions: