New Zealand, language, and ‘Chinese’

Raymond Huo, who served as a member of New Zealand’s parliament from 2008 to 2014, was born in China and moved to New Zealand twenty-one years ago from Beijing. His biography at the New Zealand Chinese Language Week Charitable Trust, an organization at which he is a co-chairman, states that he “has published seven books including two Chinese-English dictionaries as joint editor/translator.”

So when you hear that he is unhappy about how Statistics New Zealand is handling Mandarin, Cantonese, etc., in its count of languages, you might be inclined to think he is an expert who is battling ignorance in the bureaucracy. But read on.

“Treating Mandarin, Yue or other Chinese dialects as independent languages is deeply flawed,” Mr Huo said.

“It is similar to making statistical inferences about the difference between Northern English, Oceania English and Indian English, or … between pub talk and the King’s English.

“As such, English may not be the most widely spoken language if each ‘dialect’ was treated as an independent language as in the case of Mandarin and Cantonese.”

This is simply wrong. English as spoken in India, English as spoken in Oceania, and English as spoken elsewhere are all one language. Mandarin and Cantonese are not.

As expected, here comes something about Chinese characters.

The Chinese written script is broadly the same, but a single character can be pronounced in over 1000 different ways across China, according to Mr Huo.

That, however, doesn’t make “Chinese” one language. And focusing on Chinese characters is often a sign that someone has lost track of the language itself — or languages themselves, in this case.

Huo said the ranking order of English, te reo Maori, Samoan, and Hindi as the top four most spoken languages in New Zealand by Statistics NZ was “incorrect, misleading and deeply flawed.” He wants them all counted together, which would move “Chinese” into third place.

Census general manager Denise McGregor, however, said it is important to have a system of classification that enables languages to be either grouped or looked at individually.

“It’s incredibly useful to know that in a school zone, or at a specific library, or on a particular bus route there will be people who speak specifically Mandarin or Chinese,” she said.

“Just knowing they speak ‘Chinese’ isn’t likely to be as useful in targeting services.”

In the last Census, 52,263 people spoke Northern Chinese which includes Mandarin, 44,625 spoke Yue that includes Cantonese and 42,750 spoke a “Sinitic” language.

Mrs McGregor said of the 171,204 people in New Zealand of Chinese ethnicity, 45,216 were born here.

“The majority of these people do not speak any language other than English,” she said.

“We think the rich picture of the different Chinese languages and dialects is a valuable thing to have.”

Amen to that last thought. And I welcome the use of the phrase Sinitic language.

The author of the news article on this spoke with several other people.

David Soh, editor for Auckland-based Chinese language daily Mandarin Pages, said the Census figures for Mandarin speakers were “too low” to be correct.

“The figure that just over a quarter of the Chinese population are Mandarin speakers sounds too low to be accurate or true,” Mr Soh said.

“The fact is Chinese who speak Chinese dialects are often also able to converse in Mandarin, but the Census figure doesn’t seem to reflect that.”

AUT’s head of the School of Language and Culture, Sharon Harvey, said linguists would consider Chinese dialects as independent languages.

“It suits the Chinese Government to say all these languages are ‘only’ dialects but most linguists would say many are languages in their own right.”

Cantonese is a language with nine spoken tones but in Mandarin there are four, said Dr Harvey, and it would be hard to learn Cantonese and “make all those sounds” if someone hasn’t learned them as a child.

The article closes with some figures, taken from New Zealand’s Census 2013, of possible interest:
NEW ZEALAND CHINESE BY NUMBERS

  • 171,204 — population total
  • 122,964 — speak at least one or more Chinese languages
  • 45,216 — NZ born, most speak only English
  • 52,263 — speak Northern Chinese, including Mandarin
  • 44,625 — speak Yue, including Cantonese
  • 42,750 — speak a Sinitic language without further defining

source: How many people in NZ speak Chinese?, New Zealand Herald, December 3, 2015

Taipei MRT moves to adopt nicknumbering system

“He’s much too unreasonable,” interrupted the Mathemagician again. “Why, just last month I sent him a very friendly letter, which he never had the courtesy to answer. See for yourself.”

He handed Milo a copy of the letter, which read:

4738 1919,

667 394017 5841 62589
85371 14 39588 7190434 203
27689 57131 481206.

5864 98053,
62179875073

“But maybe he doesn’t understand numbers,” said Milo, who found it a little difficult to read himself.

“NONSENSE!” bellowed the Mathemagician. “Everyone understands numbers….”

— from The Phantom Tollbooth, by Norton Juster

The Taipei MRT system has announced that it may be adopting a nicknumbering system for stations within the system.

Bad idea.

And, really, it should be obvious even to city officials what a bad idea this is, given what a complete failure the city’s previous attempt at a nicknumbering system was. (The old attempt, from 2000, had Ma Ying-jeou adding things such as “4th Blvd” to road signs rather than simply fix the signs to use correct Hanyu Pinyin. But the MRT system has used Hanyu Pinyin for years, so foreigners aren’t complaining about a lack of that in 2015.)

I have, however, been complaining for many years about mistakes in the names of some MRT stations and how the MRT system has chosen some bad names. To no avail. But when a politician with no particular history that I’ve seen of giving a damn about what foreigners in Taiwan want decides to grandstand his half-cocked notion, the authorities behind the MRT system jump to implement it, no matter what the supposed beneficiaries might want. Shame on them.

Indeed, this particular politician’s history is of opposition to what foreigners want in terms of signage, as shown by his partisan remarks in favor of Tongyong Pinyin (which is widely despised by Taiwan’s foreign population) and against Hanyu Pinyin (which is almost universally preferred). So I see ample reason to question his motives here.

This new nicknumbering system, by which MRT stations will be assigned additional names (e.g., “R13” and “O11”, for one particular station) is being touted as something aimed at helping foreigners. But I know of no foreigners who have needed any great help on the MRT system — at least not since the city finally implemented Hanyu Pinyin many years ago. Certainly there has been no great outcry from foreigners for any change of this sort. Instead, the nicknumbering system is simply a bad idea that will make things worse, not better. And it will be expensive to implement — money down the drain.

Let’s look at the fragment of the nicknumbering map that the Taipei City Government included with its post.

Taipei MRT nicknumbering map fragment

Try to ignore the horrific clutter for the moment.

Note the red line (which also has a line number … that no one uses except for the MRT system itself in its announcements, something implemented in the previous bad idea from the MRT system). Anyway, along the red line, Da’an Park (which the MRT system wrongly labels “Daan Park”) is nicknumbered “R06,” Da’an as R05, and Xinyi Anhe as R04. That would make Taipei 101 / World Trade Center station R03; and Xiangshan, which is presently the terminus, would be R02. The problem here is that at least two more stations are already planned for that end of the line: Songde (松德) and Zhongpo (中坡); that would mean the final(?) station would need to be oddly nicknumbered R00, though there are no other zero stations given elsewhere. And if any stations are added after that, either the whole system would need to be renumbered or the numbers would need to head into negatives. Absurd! Such is likely also the case with other lines.

This is the sort of thing that strongly indicates that the authorities haven’t really thought this through. They’re just going forward anyway, which is foolish.

For that matter, why are there zeros marked in the numbers below ten? (For example, why “R04” rather than “R4”?) Putting zeroes next to the capital letter O (for the orange line) is certainly not going to help clarity either. For example, are people going to get “O05” right at a glance? I doubt it.

Let’s get back to the matter of clutter. This is a real problem. The more information crammed into a map, the less clear the individual elements are.

And unlike distinct station names, nicknumbers are not easy to remember. If any foreign tourist asks someone how to get to BL13, for example, people likely won’t know how to answer them. Nicknumbering is thus the opposite of helpful, which is likely part of why almost no subway system in the world uses this, other than Tokyo, whose system is much larger than Taipei’s.

Also, I can’t help but wonder how they are planning on handling this in the announcements within the cars. Those announcements are in four languages (Mandarin, Taiwanese, Hakka, and English), which takes some time to get through. Adding nicknumbers in all of those languages is going to make for never-ending talking on the announcement system — and that’s without even figuring in the nicknumbers of transfer stations as well.

I note that, to date, the comments in English to the city’s Facebook post on this are more than twenty to one in opposition to the new system. Is anyone in the city government paying attention? I hope that readers here will add their own comments to the city’s Facebook page on this. (I’m not on Facebook myself.)

The last time the city of Taipei implemented nicknumbering for anything, this was met with near-universal derision from those it was supposedly designed to help. Most people in Taiwan’s foreign community quickly recognized it was a terrible idea — really, really terrible — which unfortunately didn’t stop Taipei from cluttering up the city’s signage with largely useless information. I would have thought that the city would have learned its lesson by now.

Ma Ying-jeou gives a thumbs-up in front of a nicknumbering system street sign in Taipei
This photo from 2000 shows an almost perfect storm of bad ideas supposedly meant to help foreigners. Ma Ying-jeou, during his days as mayor of Taipei, gives a thumbs-up to a road sign with his new nicknumbering system. And above the sign for 4th Blvd is a street sign from Chen Shui-bian’s tenure as mayor. It’s in the much-hated Tongyong Pinyin romanization system — or what was Tongyong Pinyin until the designers of Tongyong Pinyin changed the system (e.g., zh –> jh) and made a lot of their own signs wrong. And to top it off, it employs InTerCaPiTaLiZaTion, another annoying bad idea that still infects the street signs of Taipei.

Here, Taipei City Government officials, is what most foreigners need and want: correct Hanyu Pinyin. For the most part, that’s what the MRT system already has. Don’t screw it up.

sources:

Common Taiwanese given names

This supplies the most common male and female given names in Taiwan. If you’re writing a story about Taiwan and need “safe” names for characters, this is a good reference — at least if your story is set in the present or not too far past.

For the most common family names in Taiwan, see Taiwan personal names: a frequency list. The data there are a few years older but remain valid, with only slight changes in the order of frequency. And don’t forget that over here the family name comes first, e.g., “Chen Ya-ting,” not “Ya-ting Chen.”

For the rankings of individual names in given years, see my PDF of the most common given names in Taiwan.

Note: Although I refer to these as “Taiwanese” names, I give the Mandarin forms (since Hanyu Pinyin is a system for writing Mandarin), not names in Hoklo/Hokkien (the language often referred to as Taiwanese).

Most popular given names for Taiwanese males, born 1976–1994

Hanzi Pinyin Spelling Likely Used by This Person
柏翰 Bǎihàn Pai-han
承翰 Chénghàn Cheng-han
冠霖 Guānlín Kuan-lin
冠廷 Guāntíng Kuan-ting
冠宇 Guānyǔ Kuan-yu
家豪 Jiāháo Chia-hao
家銘 Jiāmíng Chia-ming
建宏 Jiànhóng Chien-hung
家瑋 Jiāwěi Chia-wei
俊宏 Jùnhóng Chun-hung
俊傑 Jùnjié Chun-chieh
俊賢 Jùnxián Chun-hsien
威廷 Wēitíng Wei-ting
信宏 Xìnhóng Hsin-hung
彥廷 Yàntíng Yan-ting
宇軒 Yǔxuān Yu-hsuan
哲瑋 Zhéwěi Che-wei
志豪 Zhìháo Chih-hao
志宏 Zhìhóng Chih-hung
志偉 Zhìwěi Chih-wei
宗翰 Zōnghàn Tsung-han

Most popular given names for Taiwanese females, born 1976–1994

Hanzi Pinyin Likely Spelling
慧君 Huìjūn Hui-chun
惠如 Huìrú Hui-ju
惠婷 Huìtíng Hui-ting
惠雯 Huìwén Hui-wen
佳樺 Jiāhuà Chia-hua
佳慧 Jiāhuì Chia-hui
佳玲 Jiālíng Chia-ling
嘉玲 Jiālíng Chia-ling
佳蓉 Jiāróng Chia-jung
佳穎 Jiāyǐng Chia-ying
家瑜 Jiāyú Chia-yu
靜宜 Jìngyí Ching-yi
靜怡 Jìngyí Ching-yi
美玲 Měilíng Mei-ling
佩君 Pèijūn Pei-chun
佩珊 Pèishān Pei-shan
詩涵 Shīhán Shih-han
詩婷 Shītíng Shih-ting
淑芬 Shūfēn Shu-fen
淑華 Shūhuá Shu-hua
淑惠 Shūhuì Shu-hui
淑慧 Shūhuì Shu-hui
淑娟 Shūjuān Shu-chuan
淑玲 Shūlíng Shu-ling
淑貞 Shūzhēn Shu-chen
思穎 Sīyǐng Ssu-ying
婷婷 Tíngtíng Ting-ting
庭瑋 Tíngwěi Ting-wei
婉婷 Wǎntíng Wan-ting
琬婷 Wǎntíng Wan-ting
瑋婷 Wěitíng Wei-ting
筱涵 Xiǎohán Hsiao-han
心怡 Xīnyí Hsin-yi
欣怡 Xīnyí Hsin-yi
馨儀 Xīnyí Hsin-yi
雅芳 Yǎfāng Ya-fang
雅涵 Yǎhán Ya-han
雅惠 Yǎhuì Ya-hui
雅慧 Yǎhuì Ya-hui
雅玲 Yǎlíng Ya-ling
雅萍 Yǎpíng Ya-ping
雅琪 Yǎqí Ya-chi
雅婷 Yǎtíng Ya-ting
雅文 Yǎwén Ya-wen
雅雯 Yǎwén Ya-wen
雅筑 Yǎzhù Ya-chu
怡安 Yí’ān Yi-an
宜君 Yíjūn Yi-chun
怡君 Yíjūn Yi-chun
怡伶 Yílíng Yi-ling
怡如 Yírú Yi-ju
宜庭 Yítíng Yi-ting
怡婷 Yítíng Yi-ting
依婷 Yītíng Yi-ting
怡萱 Yíxuān Yi-hsuan
郁婷 Yùtíng Yu-ting
鈺婷 Yùtíng Yu-ting
郁雯 Yùwén Yu-wen

The names were derived from Chih-Hao Tsai’s list of 25 most common given names by year. I have added Pinyin and the spelling in the romanization system likely used by someone in Taiwan with that name (bastardized Wade-Giles). In addition, with the help of my wife, I assigned names to the categories of male or female.

The data are from the university entrance exams, 1994–2012. Positing that the students were age 18 when they took the exam supplies the range for years of birth.

Sign in seal script

It’s time again to play What’s That Character?

Feel free to ask others what they think, though enlisting the aid of historians and calligraphy masters would count as cheating, as all of these examples are not from a museum or a calligraphy scroll but from a sign outside a building meant to be read by all.

Chinese character number one:
mystery_hanziv

Chinese character number two:
mystery_hanziii

And Chinese character number three:
mystery_hanziiv

OK? Ready?

Here are the answers:

How’d you do?

*** SPOILERS BELOW ***

If you got even one right, hái bùcuò. That’s probably as well as or better than the average person literate in Chinese characters.

Here is the entire sign, which will probably make things much clearer.

If you’re a Mandarin speaker and used to reading Chinese characters, you can probably tell what the entire sign says without too much effort. But as this exercise may help to show, that is not because most people can truly read all the characters but because they can fill in the blanks, as it were, when presented with adequate context. Yes, those are all written in seal script, not in a modern style; but seal script is all that is given on the sign.

I want to stress that this isn’t a sign for a historical museum or even the Cultural Bureau. Nope, it’s for the Xinbei City Government’s Environmental Protection Department, here in lovely Banqiao. Those used to the ways of Taiwan (or maybe just the ways of the world) have probably already correctly guessed that it was the director who thought a seal-script font would be a good idea. (See the news stories below for more on that. Although the reports are from a couple of years ago, I took the photos just a couple of weeks ago.)

Don’t forget: If you want to put Chinese characters or tonal Pinyin in your comments, use the encoder first and copy and paste the results into the comments box.

News stories:

Dissolving Pinyin

Late last week, Victor Mair — with some assistance from Matt Anderson, David Moser, me, and others — wrote in “Lobsters”: a perplexing stop motion film about a short 1959 film from China that gives some Pinyin. In some cases, the Pinyin is presented for a second and then is quickly dissolved into Chinese characters. Since Victor’s post supplies only the text, I thought that I’d supplement that here with images from the film.

See the original post for translations and discussion.

The film often shows a newspaper. The headline (at 7:57) reads (or rather should read, since the first word is misspelled):

QICHE GUPIAO MENGDIE
DAPI LONGXIA ZHIXIAO

longxia_pinyin_757

But since the image above doesn’t show the name of the paper, I’m also offering this rotated and cropped photo, that allows us to see that this is the “JIN YUAN DIGUO RI-BAO”
longxia_ribao

Elsewhere, there are again some g’s for q’s. For the first example of text dissolving from Pinyin to Chinese characters (at 2:11), I’m offering screenshots of the text in Pinyin, the text during the dissolve, and the text in Chinese characters. Later I’ll give just the Pinyin and Chinese characters.

Hongdang Louwang
Yipi hongdang zai daogi [sic] jiudian jihui buxing guanbu [sic] louwang

longxia_pinyin_211a

longxia_pinyin_211b

longxia_pinyin_211c

Soon thereafter (at 2:44), we get a handwritten note.
longxia_pinyin_244a

longxia_pinyin_244c

At 3:39 we’re shown the printed notice in the newspaper of the above text.
longxia_pinyin_339a

longxia_pinyin_339c

A brief glance at the newspaper at 3:23 gives us FA CHOU, which is probably referring to the stink the bad lobsters are giving off.
longxia_pinyin_323

Here a man is carrying a copy of Zibenlun (Das Kapital), by Makesi (Marx).
longxia_pinyin_911

Actually, it’s not really Das Kapital, just the cover of the book; inside is a stack of decadent Western material. “MEI NE” is probably supposed to be “MEINÜ” (beautiful women).
longxia_pinyin_558_meinu

I imagine that, in the PRC of 1959, the artists for this film must have inwardly rejoiced at the chance to draw something like that for a change, and that is also why there’s a nude on the wall in one scene.
longxia_pinyin_439_meinu

UTF-8 Unicode vs. other encodings over time

Some eight years ago UTF-8 (Unicode) became the most used encoding on Web pages. At the time, though, it was used on only about 26% of Web pages, so it had a plurality but not an absolute majority.

Graph showing growth of the UTF-8 encoding

By the beginning of 2010 Unicode was rapidly approaching use on half of Web pages.
graph showing a steep rise in the use of UTF-8 and a steep decline in other major encodings

In 2012 the trends were holding up.
UTF-8_website_use_2001-2012

Note that the 2008 crossover point appears different in the latter two Google graphs, which is why I’m showing all three graphs rather than just the third.

A different source (with slightly different figures) provides us with a look at the situation up to the present, with UTF-8 now on 85% of Web pages. Expansion of UTF-8 is slowing somewhat. But that may be due largely to the continuing presence of older websites in non-Unicode encodings rather than lots of new sites going up in encodings other than UTF-8.
growth in Unicode UTF-8 encoding on Web pages, 2010-2015

Here’s the same chart, but focusing on encodings (other than UTF-8) that use Chinese characters, so the percentages are relatively low.
asian_language_encodings_2010-2015

And here’s the same as the above, but with the results for individual languages combined.
asian_language_encodings_2010-2015_by_language

By the way, Pinyin.info has been in UTF-8 since the site began way back in 2001. The reason that Chinese characters and Pinyin with tone marks appear scrambled within Pinyin News is that a hack caused the WordPress database to be set to Swedish (latin1_swedish_ci), of all things. And I haven’t been able to get it fixed; so just for the time being I’ve given up trying. One of these days….

Sources:

Pinyin font: Skarpa

Today’s Pinyin-friendly font is Skarpa, by Aga Silva of Poland. It’s a bit quirky (e.g., second-tone o’s and lowercase q’s) but still sharp.

Hanyu Pinyin pangram using the Skarpa font

Skarpa was later modified into Skarpa 2, which is not free but which comes in several weights and types.

Most of Silva’s other fonts also can handle Pinyin with tone marks. Those are all commercial rather than free.

Popularity of Chinese character country code TLDs

Yesterday we looked at the popularity of the Chinese character TLD for Singapore Internet domains. Today we’re going to examine the Chinese character ccTLDs (country code top-level domains) for those places that use Chinese characters and compare the figures with those for the respective Roman alphabet TLDs.

In other words, how, for example, does the use of taiwan in traditional Chinese characters   .台灣 domains compare with the use of .tw domains?

Since, unlike the case with Singapore, I don’t have the registration figures, I’m having to make do with Google hits, which is a different measure. For this purpose, Google is unfortunately a bit of a blunt instrument. But at least it should be a fairly evenhanded blunt instrument and will be useful in establishing baselines for later comparisons.

A few notes before we get started:

  • Japan has yet to bother with completing the process for its own name in kanji (Japan, as written in kanji / Chinese characters), so it is omitted here.
  • Macau only recently asked for aomen in simplified Chinese characters    
  .澳门 and aomen in traditional Chinese characters    
  .澳門, so those figures are still at zero.
  • Oddly enough, there’s no taiwan_super in traditional Chinese characters   
  .臺灣 ccTLD, even though the Ma administration, which was in power when Taiwan’s ccTLDs went into effect, officially prefers the more complex form of taiwan_super in traditional Chinese characters   
  .臺灣 to taiwan in traditional Chinese characters   .台灣 — not to mention prefering it to taiwan in simplified Chinese characters    
  .台湾.
  Google Hits Percent of Total
MACAU    
.mo 18400000 100.00
aomen in simplified Chinese characters    
  .澳门 0 0.00
aomen in traditional Chinese characters    
  .澳門 0 0.00
TAIWAN    
.tw 206000000 99.86
taiwan in simplified Chinese characters    
  .台湾 67600 0.03
taiwan_super in traditional Chinese characters   
  .臺灣 0 0.00
taiwan in traditional Chinese characters   .台灣 230000 0.11
HONG KONG    
.hk 193000000 99.94
xianggang  in Chinese characters 
  .香港 118000 0.06
SINGAPORE    
.sg 97800000 100.00
xinjiapo  in Chinese characters 
  .新加坡 2 0.00
CHINA    
.cn 315000000 99.61
zhongguo in simplified Chinese characters  
  .中国 973000 0.31
zhongguo in traditional Chinese characters   
  .中國 251000 0.08

So in no instance does the Chinese character ccTLD reach even one half of one percent of the total for any given place.

Here are the results in a chart.

Graph showing that although China leads in domains in Chinese characters, they do not reach even one half of one percent of the total for China

Note that the ratio of simplified:traditional forms in China and Taiwan are roughly mirror images of each other, as is perhaps to be expected.

See also Platform on Tai, Pinyin News, December 30, 2011