New database of cross-strait differences in Mandarin goes online

Last week, on the same day President Ma Ying-jeou accepted the resignation of a minister who made some drunken lewd remarks at a wěiyá (year-end office party), Ma was joking to the media about blow jobs.

Classy.

screenshot from a video of a news story on this

But it was all for a good cause, of course. You see, the Mandarin expression chuī lǎba, when not referring to the literal playing of a trumpet, is usually taken in Taiwan to refer to a blow job. But in China, Ma explained, chuī lǎba means the same thing as the idiom pāi mǎpì (pat/kiss the horse’s ass — i.e., flatter). And now that we have the handy-dandy Zhōnghuá Yǔwén Zhīshikù (Chinese Language Database), which Ma was announcing, we can look up how Mandarin differs in Taiwan and China, and thus not get tripped up by such misunderstandings. Or at least that’s supposed to be the idea.

The database, which is the result of cross-strait cooperation, can be accessed via two sites: one in Taiwan, the other in China.

It’s clear that a lot of money has been spent on this. For example, many entries are accompanied by well-documented, precise explanations by distinguished lexicographers. Ha! Just kidding! Many entries are really accompanied by videos — some two hundred of them — of cutesy puppets gabbing about cross-strait differences in Mandarin expressions. But if there’s a video in there of the panda in the skirt explaining to the sheep in the vest that a useful skill for getting ahead in Chinese society is chuī lǎba, I haven’t found it yet. Will NMA will take up the challenge?

Much of the site emphasizes not so much language as Chinese characters. For example, another expensively produced video feeds the ideographic myth by showing off obscure Hanzi, such as the one for chěng.

WARNING: The screenshot below links to a video that contains scenes with intense wawa-ing and thus may not be suitable for anyone who thinks it’s not really cute for grown women to try to sound like they’re only thwee-and-a-half years old.

cheng3

In a welcome bit of synchronicity, Victor Mair posted on Language Log earlier the same week on the unpredictability of Chinese character formation and pronunciation, briefly discussing just such patterns of duplication, triplication, etc.

Mair notes:

Most of these characters are of relatively low frequency and, except for a few of them, neither their meanings nor their pronunciations are known by persons of average literacy.

Many more such characters consisting or two, three, or four repetitions of the same character exist, and their sounds and meanings are in most cases equally or more opaque.

The Hanzi for chěng (which looks like 馬馬馬 run together as one character) in the video above is sufficiently obscure that it likely won’t be shown correctly in many browsers on most systems when written in real text: 𩧢. But never fear: It’s already in Unicode and so should be appearing one of these years in a massively bloated system font.

Further reinforcing the impression that the focus is on Chinese characters, Liú Zhàoxuán, who is the head of the association in charge of the project on the Taiwan side, equated traditional Chinese characters with Chinese culture itself and declared that getting the masses in China to recognize them is an important mission. (Liu really needs to read Lü Shuxiang’s “Comparing Chinese Characters and a Chinese Spelling Script — an evening conversation on the reform of Chinese characters.”)

Then he went on about how Chinese characters are a great system because, supposedly, they have a one-to-one correspondence with language that other scripts cannot match and people can know what they mean by looking at them (!) and that they therefore have a high degree of artistic quality (gāodù de yìshùxìng). Basically, the person in charge of this project seems to have a bad case of the Like Wow syndrome, which is not a reassuring trait for someone in charge of producing a dictionary.

The same cooperation that built the Web sites led to a new book, Liǎng’àn Měirì Yī Cí (《兩岸每日一詞》 / Roughly: Cross-Strait Term-a-Day Book), which was also touted at the press conference.

The book contains Hanyu Pinyin, as well as zhuyin fuhao. But, alas, the book makes the Pinyin look ugly and fails completely at the first rule of Pinyin: use word parsing. (In the online images from the book, such as the one below, all of the words are se pa ra ted in to syl la bles.)

The Web site also has ugly Pinyin, with the CSS file for the Taiwan site calling for Pinyin to be shown in SimSun, which is one of the fonts it’s better not to use for Pinyin. But the word parsing on the Web site is at least not always wrong. Here are a few examples.

  • “跑神兒” is given as pǎoshénr (good).
  • And apostrophes appear to be used correctly: e.g., fàn’ān (販安), chūn’ān (春安), and fēi’ān (飛安).
  • But “第二春” is run together as “dìèrchūn” (no hyphen) rather than as shown correctly as dì-èr chūn.
  • And “一個頭兩個大” is given as yíɡe tóu liǎnɡɡe dà (for Taiwan) and yīɡe tóu liǎnɡɡe dà (for China). But ge is supposed to be written separately. (The variation of tone for yi is in this case useful.)

Still, my general impression from this is that we should not expect the forthcoming cross-strait dictionary to be very good.

Further reading:

Botanical descriptions: English at last

Camellia Japonica. Rose Camellia.As of the beginning of this year, the International Code of Botanical Nomenclature no longer compels botanists to write descriptions of new species in Latin. Instead, they can opt for English, though the names of the species themselves will still be given in Latin.

As James S. Miller, dean and vice president for science at the New York Botanical Garden, noted in the New York Times earlier this week:

No longer will botanists have to write sentences like: “Arbor usque ad 6 m alta. Folia decidua; lamina oblanceolata vel elliptica-oblongata, 2-7 cm longa,” as I did in 2009, describing a new species from Mexico. Instead, I could simply write that Bourreria motaguensis was a six-meter-tall tree with deciduous leaves that were 2 to 7 centimeters long.

The change “will help to speed up the race to catalog the world’s plant life,” he added.

Elsewhere, plant biologist Jerrold Davis of Cornell University was quoted as saying that he did not think the permitted switch to English would speed things up. Nevertheless, he called the move an “overdue modernization.”

Mark Watson, of Edinburgh’s Royal Botanic Garden and secretary of the International Botanical Congress’ special committee on electronic publication, expressed the same sentiment, saying of the move away from Latin, “About time too,” adding that translation into Latin is not necessarily an easy task for researchers in countries such as China.

Davis noted, “The removal of the Latin requirement is an acceptance that English has become the language of science, and Latin has become an encumbrance rather than a facilitator of communication” [emphasis added].

Even if it does not accelerate the publishing process, the end of the Latin requirement may allow for greater inclusion of scientists from countries where education rarely includes instruction in classical languages. According to Sandra Knapp, a botanist with the Natural History Museum in London: “In places like Ethiopia, for example, people are finding it very difficult to write in Latin. But in reality everybody’s bad at it.” [emphasis added]

cover of the book _Latin, or, The Empire of a Sign_, by Francoise WaquetIn case you’re wondering why I’m writing about Latin and English on a blog that focuses mainly on Pinyin and Sinitic languages, I’m struck by the parallels between the position of Latin in the West (see Françoise Waquet’s terrific Latin, or, The Empire of a Sign: From the Sixteenth to the Twentieth Centuries) and (1) notions of Literary Sinitic (a.k.a. “classical Chinese”) in the Sinosphere and (2) beliefs in the efficacy of Chinese characters despite extensive problems with their near-exclusive use as a script for Mandarin. I’m pleased that scientists no longer are forced to follow a tradition that many found cumbersome and outdated.

Another noteworthy modernization in the world of scientific botany is that the International Code of Botanical Nomenclature now also recognizes publication in online academic journals as valid; print publication is no longer required.

Sources and further reading:

Editions of Latin, or, The empire of a sign, by Françoise Waquet:

Just a random sampling of Latin from Linnaeus -- this has nothing in particular to do with the flower above

just another random example of Latin in the field of botany

Pinyin font: Flexion Pro

Today I’d like to introduce a highly individualistic font family that supports Hanyu Pinyin: Flexion Pro. This one, however, isn’t free.

sample of the Flexion Pro font being used for text in Hanyu Pinyin

Flexion was originally designed for the movie The DaVinci Code, which is apt, given how the main character, Robert Langdon, was named after Flexion’s designer (and ambigram specialist) John Langdon. Hal Taylor completed the font.

Here’s part of Taylor’s description:

Flexion is possibly the only symmetrical type design currently available. In keeping with John’s well-know propensity for ambigrams, many of the characters are mirrored to become other characters; the B is a reversed E, the C is reversed to become a D, G is a mirrored P, the K is a reversed N, and so on.

Of course, some of the tone marks will complicate making this work for ambigrams. But since Chinese-English bilingual ambigrams are possible, making Pinyin ambigrams, even with tone marks, shouldn’t be out of the question.

Flexion comes in four weights.

Platform on tai?

President Ma Ying-jeou’s re-election campaign slogan is “Táiwān jiāyóu,” so one can see that all around Taiwan these days, as the election is only about two weeks away.

The Ma campaign has decided that the English translation of “Táiwān jiāyóu” is “Taiwan, Bravo,” which isn’t quite right but at least sounds positive. Of Ma’s two opponents, Tsai Ing-wen (Cài Yīngwén / 蔡英文) of the anti-Hanyu-Pinyin Democratic Progressive Party chose the somewhat cryptic English slogan of “Taiwan next,” while third-party candidate James Soong (Sòng Chǔyú / 宋楚瑜) chose as his slogan “Me, me, me!”

OK, I made that last one up, but only because I couldn’t find the real one, other than maybe it’s “Renew.” (Does anyone know for sure?)

What I really want to talk about here, though, is how Ma’s slogan gets written: 台灣加油.

There is of course nothing unusual about that — except that Ma likes to make a big deal out of using traditional Chinese characters rather than simplified ones. Every year or so Ma talks about how he wants to get the United Nations to declare traditional Chinese characters a super-duper world something-or-other. He has already purged government Web sites of versions that people in China and Singapore could read more easily than versions in traditional Chinese characters. And if he criticizes the PRC, it’s often to tell Beijing that people in China really ought to use traditional characters. Ma’s devotion to people in China being able to have traditional Hanzi reminds me of George W. Bush during the Hainan incident:

“Do the members of the crew have Bibles?” “Why don’t they have Bibles?” Can we get them Bibles?” “Would they like Bibles?”

In other words, while that might be a concern, I sometimes wonder about his priorities.

By now a lot of you are probably thinking, “But is one of those simplified characters that is not only OK to use in Taiwan but also by far more commonly seen than . So what’s strange about this?”

That’s entirely correct. In most cases there would be nothing noteworthy about using “台灣加油” rather than “臺灣加油.” It seems entirely normal. What’s strange here is that the Ma administration actually has a position on the matter of 臺 vs. 台: Although the form can be tolerated in some instances, is supposedly better and is mandatory in certain cases.

About a year ago, for example, the Ministry of Education reported that official government documents (gōngwén/公文) would have to use the form. And textbooks would need to be updated to change instances of 台灣, 台北, 台南, 台中, etc., to 臺灣, 臺北, 臺南, 臺中…. Webmasters of some government Web sites scurried to perform a whole lot of search-and-replace. There were not, however, so many instances of 台灣 to change to 臺灣 because Ma had already declared that in Mandarin pages “台灣” (Taiwan) was out and “中華民國” (Zhōnghuá Mínguó / the Republic of China) was in; so mainly this was visible in city names in addresses.

Predictably, though, lots never got changed. (“Close enough for government work.”)

Yes, I know: None of you are deeply shocked by the notion that a politician would tell people to do one thing but do something else himself. And the way the premier downplayed the policy makes me suspect many find it pointless or even embarrassing. Still, the fact remains that the administration did decide not to leave well enough alone and went out of its way to favor 臺 over 台.

Supposedly this is because after the Ministry of Education studied the origins of 臺 and 台, it decided that the tai in the name Taiwan should be written as 臺, according to Chen Hsueh-yu (Chén Xuěyù / 陳雪玉), executive secretary of the ministry’s National Languages Committee.

This doesn’t much sense. Whichever form got used first — which is a dubious method for determining the correctness of usage for something now — the tai in Taiwan doesn’t have anything to do semantically with platforms, terraces, tables, stations, etc. In the case of the origin of the name of Taiwan, there’s no more meaning inherent in than there is in — or than there is in the Roman letters Tai, either, for that matter. As Victor Mair has noted:

Superficially (according to the surface signification of the two characters with which the name is customarily written), “Taiwan” means “Terrace Bay.” That sounds nice, even poetic, but it is an inauthentic etymology and has nothing whatsoever to do with the actual origins of the name. (This is a typical instance of the common fallacy of wàngwénshēngyì 望文生義, whereby the semantic qualities of Chinese characters interfere with the real meanings of the terms that they are being used to transcribe phonetically.) The true derivation of the name “Taiwan” is actually from the ethnonym of a tribe in the southwest part of the island in the area around Ping’an. As early as 1636, a Dutch missionary referred to this group as Taiouwang. From the name of the tribe, the Portuguese called the area around Ping’an as Tayowan, Taiyowan, Tyovon, Teijoan, Toyouan, and so forth. Indeed, already in his ship’s log of 1622, the Dutchman Comelis Reijersen referred to the area as Teijoan and Taiyowan. Ming and later visitors to the island employed a plethora of sinographic transcriptions to refer to the area (superficially meaning “Terrace Nest Bay” [Taiwowan 臺窝灣], “Big Bay” [Dawan 大灣], “Terrace Officer” [Taiyuan 臺員], “Big Officer” [Dayuan 大員], “Big Circle” [Dayuan 大圓], “Ladder Nest Bay” [Tiwowan 梯窝灣], and so forth). Some of these transcriptions are clever, others are fantastic, but none of them should be taken seriously for their meanings.

I’m not sure how best to characterize — sorry — the differences between “台灣加油” and “臺灣加油.” Although using the 臺 form would definitely come across as more formal, it wouldn’t be exactly the equivalent of “Fight Fiercely, Harvard.” Yet the use of the 台 form isn’t really the equivalent of a campaigning politician droppin’ his g’s either.

臺 vs. 台

Additional sources:

Please don’t write to comment for or against simplified characters in general. This post isn’t about that really, even though 臺 could serve as a poster child for Hanzi simplification.

Yilan signage

Here are some signs in Yilan, which is in northeastern Taiwan.

As the examples below demonstrate, Yilan uses Hanyu Pinyin on its street signs. I saw only one old street sign in Tongyong Pinyin; this was through the window of a bus in motion, so I wasn’t able to get a photo.

????? Lane 2 ? Zhongshan Rd., Sec.5

??? Lane 180 ? Jinmei Rd.

It seems that Yilan has problems with apostrophes as well. These should, of course, read Xi’an.
??? Xian St.

??? Lane 1 ? Xian St.

In Taiwan, the vast majority of street names are two syllables long. Here’s a rare three-syllable name. I was told that the name comes from the company that constructed the irrigation channel parallel to the road. The sign — and even the name itself — is so new that it’s not in the current version of Google maps.

???? Jintongchun Rd.

Some decorative signage.

Note the use of “WC”.
bas relief wood carving of area roads, with some buildings indicated

I don’t care much for Yilan’s rainy weather; but the city does have style. These signs, for example, are interesting — much more so than a failed attempt at a decorative sign in Tongyong Pinyin in Banqiao.
asymmetrical pieces of metal with Chinese characters punched out, revealing place names

The highway signs in Yilan, however, are in Tongyong Pinyin. This is a somewhat odd situation, given that highway signs belong to the national government, which is under the control of the KMT, which supports Hanyu Pinyin. Yilan is back in the DPP camp. (The Democratic Progressive Party continues to oppose Hanyu Pinyin and support Tongyong Pinyin.) The switch of streets signs to Hanyu Pinyin was probably done under the previous magistrate, who was a member of the KMT.

I’m including this one despite the poor image quality because I want to note the awful typography (e.g., uneven baselines, capital letters too large).
Jiaosi Longtan Jhuangwei

Jiaosi Toucheng Sindian

Google introduces many new errors to Taipei-area maps

What on earth is going on over at Google?

Just last week I had nothing but love for Google Maps because it had finally made some important improvements to its maps of Taiwan. But just a few days later Google went and screwed up its maps again. The names of most of Taipei’s MRT stations are now written incorrectly. In most cases, this is merely a matter of form, with capitalization — and the important designation of MRT — missing. But in more than just a few instances some astonishing typos have been introduced. What’s especially puzzling and irksome about this is that in most of these cases Google Maps swapped good information for bad.

Meow tipped me off in a comment yesterday that “In Google Maps, Jiannan Rd. Station and Gangqian Station become Jianan road station and Ganggian station.”

Here’s a screenshot taken today of some MRT stations in Dazhi and Neihu:

As Meow said, Jiannan is written Jianan, and Gangqian is written Ganggian. What’s more, Dazhi is written Dachi, and Xihu is written His-Hu (Cupertino effect?).

There are now many such errors.

Here’s a screenshot taken last week.
dfd

And here’s the same place today.

As you can see, one of the instances of Jieyunsongjiangnanjing has been removed, which is good. But that’s the end of the good news. Another Jieyunsongjiangnanjing remains. And the one that was removed was replaced by Songjian nanjing station, with Songjiang misspelled and Nanjing and Station erroneously in lower case. And “MRT” is missing too.

It’s not just the station name that was changed, as the switch of one location from the Thai tourism office to the Panamanian embassy shows. (Perhaps both are in the same building.)

Here are some more examples of recently introduced errors.

Luchou should be Luzhou.
screenshot from Google maps showing 'Luchou' for 'Luzhou'

click to see unrotated image

screenshot from Google maps showing 'Sun-yat-sen memorial hall station' for 'Sun Yat-sen Memorial Hall Station'

The westernmost station on the blue line is now labeled Tongning. The pain! The pain! It should be Yongning, which is also visible.
screenshot from Google maps showing 'Tongning' instead of 'Yongning'

In perhaps the oddest example, Qili’an, which has been miswritten Qilian for years, has been redesignated Chlian.
screenshot from Google maps showing 'Chlian' instead of 'Qili'an'

Above we saw Gangqian written incorrectly as Ganggian and Minquan written incorrectly as Minguan. Here’s another example of a q being turned into a g: Banqiao has become Bangiao. Even the train station, which is a different rail system than the MRT, has been affected. But the High Speed Rail Station name remains in Tongyong Pinyin, which I most certainly disapprove of but which at least represents the current state of signage in the HSR system.
screenshot from Google maps showing 'Bangiao' instead of 'Banqiao'

Sloppy work, Google. Very sloppy. How could this have happened?

Christ Avenue

I thought some of you might like this.

No, this isn’t an official city street sign. (For one thing, Taipei translates dàdào as boulevard, not avenue.) But Christ Avenue (Jīdū Dàdào / 基督大道) really is what Chinese Culture University in Taipei uses for one of its internal roads, though I didn’t find it in Google Maps.

photo of a large decorative road sign reading 'Christ Avenue / 基督大道' with 'Chinese Culture University' at the bottom

Or, if you’d rather walk a different road, you might try the campus’s Confucius Avenue.

Google improves its maps of Taiwan

Two years ago when Google switched to Hanyu Pinyin in its maps of Taiwan, it did a poor job … despite the welcome use of tone marks.

Here are some of the problems I noted at the time:

  • The Hanyu Pinyin is given as Bro Ken Syl La Bles. (Terrible! Also, this is a new style for Google Maps. Street names in Tongyong were styled properly: e.g., Minsheng, not Min Sheng.)
  • The names of MRT stations remain incorrectly presented. For example, what is referred to in all MRT stations and on all MRT maps as “NTU Hospital” is instead referred to in broken Pinyin as “Tái Dà Yī Yuàn” (in proper Pinyin this would be Tái-Dà Yīyuàn); and “Xindian City Hall” (or “Office” — bleah) is marked as Xīn Diàn Shì Gōng Suǒ (in proper Pinyin: “Xīndiàn Shìgōngsuǒ” or perhaps “Xīndiàn Shì Gōngsuǒ“). Most but not all MRT stations were already this incorrect way (in Hanyu Pinyin rather than Tongyong) in Google Maps.
  • Errors in romanization point to sloppy conversions. For example, an MRT station in Banqiao is labeled Xīn Bù rather than as Xīnpǔ. (埔 is one of those many Chinese characters with multiple Mandarin pronunciations.)
  • Tongyong Pinyin is still used in the names of most cities and townships (e.g., Banciao, not Banqiao).

I’m pleased to report that Google Maps has recently made substantial improvements.

First, and of fundamental importance, word parsing has finally been implemented for the most part. No more Bro Ken Syl La Bles. Hallelujah!

Here’s what this section of a map of Tainan looked like two years ago:

And here’s how it is now:

Oddly, “Jiànxīng Jr High School” has been changed to “Tainan Municipal Chien-Shing Jr High School Library” — which is wordy, misleading (library?), and in bastardized Wade-Giles (misspelled bastardized Wade-Giles, at that). And “Girl High School” still hasn’t been corrected to “Girls’ High School”. (We’ll also see that problem in the maps for Taipei.)

But for the most part things are much better, including — at last! — a correct apostrophe: Yǒu’ài St.

As these examples from Taipei show, the apostrophe isn’t just a one-off. Someone finally got this right.

Rén’ài, not Renai.
screenshot from Google Maps, showing how the correct Rén'ài (rather than the incorrect Renai) is used

Cháng’ān, not Changan.
screenshot from Google Maps, showing how the correct Cháng'ān is used

Well, for the most part right. Here we have the correct Dà’ān (and correct Ruì’ān) but also the incorrect Daan and Ta-An. But at least the street names are correct.
click for larger screenshot from Google Maps, showing how the correct Dà'ān (and correct Ruì'ān) is used but also the incorrect Daan and Ta-An

Second, MRT station names have been fixed … mostly. Most all MRT station names are now in the mixture of romanization and English that Taipei uses, with Google Maps also unfortunately following even the incorrect ones. A lot of this was fixed long ago. The stops along the relatively new Luzhou line, however, are all written wrong, as one long string of Pinyin.

To match the style used for other stations, this should be MRT Songjiang Nanjing, not Jieyunsongjiangnanjing.
screenshot from Google Maps, showing how the Songjiang-Nanjing MRT station is labeled 'Jieyunsongjiangnanjing Station' (with tone marks)

Third, misreadings of poyinzi (pòyīnzì/破音字) have largely been corrected.

Chéngdū, not Chéng Dōu.
screenshot from Google Maps, showing how the correct 'Chéngdū Rd' is used

Like I said: have largely been corrected. Here we have the correct Chéngdū and Chóngqìng (rather than the previous maps’ Chéng Dōu and Zhòng Qìng) but also the incorrect Houbu instead of the correct Houpu.
screenshot from Google Maps, showing how the correct Chóngqìng Rd and Chéngdū St are used but also how the incorrect Houbu (instead of Houpu) is shown

But at least the major ones are correct.

Unfortunately, the fourth point I raised two years ago (Tongyong Pinyin instead of Hanyu Pinyin at the district and city levels) has still not been addressed. So Google is still providing Tongyong Pinyin rather than the official Hanyu Pinyin at some levels. Most of the names in this map, for example, are distinctly in Tongyong Pinyin (e.g., Lujhou, Sinjhuang, and Banciao, rather than Luzhou, Xinzhuang, and Banqiao).

Google did go in and change the labels on some places from city to district when Taiwan revised their names; but, oddly enough, the company didn’t fix the romanization at the same time. But with any luck we won’t have to wait so long before Google finally takes care of that too.

Or perhaps we’ll have a new president who will revive Tongyong Pinyin and Google will throw out all its good work.