ɑ vs. a

image of the rounded 'a' and the normal 'a' with the example given of the word 'Hanyu' (with tone marks)About a year ago (which is roughly how overdue this post is), a commenter noted that some Chinese publishers “are convinced that Pinyin must be printed with ɑ (single-story „Latin alpha“, as opposed to double-story a), and with ɡ (single story; not double story g).”

But does Hanyu Pinyin in fact call for this longstanding Chinese habit of bad typography? This was one of the first questions I asked of Zhou Youguang, the father of Hanyu Pinyin, when I met with him: Are those who insist upon the ɑ-style letter correct?

“Oh, no,” Zhou replied. “That ‘ɑ’ is just for babies!” And he laughed that wonderful laugh of his that no doubt has contributed to his remarkable longevity.

Zhou was referring to the facts that the “ɑ” style of letter is usually found specifically in books for infants … and that this style generally does not belong elsewhere. In fact, ɑ and ɡ (written thusly, as opposed to g) are often referred to as infant characters. A variant of the letter y is sometimes included in this set.

Letters in that style are also found in the West — but almost always in books for toddlers, and often not even in those. Furthermore, even in those cases the use of such letters appears to have no positive effect on children’s reading.

The correct-style letters for Pinyin are the same as those for English, Zhou stated.

I hope that anyone who has been using “ɑ” will both officially and in practice switch to “a”. It’s long past time that the supposed rule calling for “ɑ” was treated as a dead letter.

Long live good typography!

persistent MPS2

Poagao sent me this photo of signs on Zhong’an Bridge, which joins Xindian and Zhonghe (both in Taipei County). (So the zhong is probably for Zhonghe; but I’m not sure what the an is meant to be short for.) The signs are a good illustration of the sloppy approach to romanization in Taiwan. Because this is a new bridge, these are definitely new signs and thus should be in Hanyu Pinyin, which is official not just in Taipei County but nationally.

two large directional signs above a road across a bridge, as described in this post

As the table below shows, however, the only name that definitely isn’t written in MPS2 — the romanization system that predated Tongyong, which in Taiwan predated Hanyu Pinyin — is a typo. MPS2 hasn’t been official for the better part of a decade.

on the sign system Hanyu Pinyin
Junghe MPS2 Zhōnghé
Benchian wrong in all systems Bǎnqiáo
Jingping (MPS2, Tongyong, Hanyu Pinyin) Jǐngpíng
Shioulang MPS2 Xiùlǎng

And there’s no excuse for making “Shioulang Bridge” so small and squashed. This also brings to mind another aspect of Hanyu Pinyin: because of its design and the fact that it uses abbreviated forms of some vowel combinations (e.g., uei -> ui, iou -> iu), it doesn’t need as much horizontal space as MPS2 or Tongyong Pinyin, which means it can be written with larger letters — an important factor in signage. (See the second table of the comparative typing chart to see such differences between Hanyu Pinyin and Tongyong Pinyin.)

system spelling
MPS2 Shioulang
Tongyong Pinyin Sioulang
Hanyu Pinyin Xiulang

history bite: around this time in 1977

Thirty-three years ago the third United Nations Conference on the Standardization of Geographical Names voted 43-1 in favor of adopting Hanyu Pinyin as “the international system for the romanization of Chinese geographical names,” which was a major step in establishing the use of Hanyu Pinyin internationally.

The one nay vote came from the United States, which said that changing the Library of Congress’s records from Wade-Giles to Pinyin would be prohibitively expensive. (The Library of Congress did not begin its Pinyin-conversion project until twenty years later.) This may also have had to do with the fact that at the time the United States did not recognize the People’s Republic of China but instead had diplomatic relations with the Republic of China (i.e., Taiwan), which didn’t adopt Hanyu Pinyin itself until more than thirty years later (and its implementation here is still incomplete).

1977 nián zài Yǎdiǎn jǔxíng de Liánhéguó dì-sān jiè dìmíng biāozhǔnhuà huìyì shàng, yǐ 43 piào zànchéng, 1 piào fǎnduì de jiéguǒ, tōngguò le cǎiyòng Hànyǔ Pīnyīn zuòwéi Zhōngguó dìmíng Luómǎ zìmǔ de guójì biāozhǔn de tí’àn. 1 piào fǎnduì de shì Měiguó. Jùshuō shì yīnwèi rúguǒ gǎiyòng Hànyǔ Pīnyīn, Měiguó Guóhuì Túshūguǎn jiāng “hàozī tài dà” (cǐ túshūguǎn de Zhōngwén shūkān míng yǐqián quán yòng Wēituǒmǎshì pīnyīn).

source: Xinhua Pinxie Cidian, by Yin Binyong.

X marks the spot?

In December Taiwan will be getting a new city. In fact, it will be the most populous city in the entire country: Xīnběi Shì (新北市).

For those not familiar with the situation, I should perhaps give a bit of background. Taiwan won’t suddenly have more people or buildings. Instead, the area known as Taipei County (which does not include the city of Taipei but which occupies a much greater area than Taipei and has a much greater total population) will be getting a long-overdue official upgrade to a “special municipality,” which means that it will get a lot more money and civil servants per capita from the central government. And as such the area will be dubbed a city, even though in appearance and demographic patterns it isn’t really a city at all but still a county containing several cities (which are to become “districts” despite having hundreds of thousands more inhabitants than some other places labeled “cities”), lots of towns, and plenty of empty countryside.

The Mandarin name will change from Táiběi Xiàn to Xīnběi Shì. (Xīn is the Mandarin word for “new.” Xiàn is “county.” Shì is “city.” And běi is “north.”)The official so-called English name is, tentatively, “Xinbei City.” Hanyu Pinyin! Yea!

Talking about “English” names is often misleading, since many people conflate English and romanization of Mandarin; and the usual pattern of Taiwanese place names not written in Chinese characters tends to be MANDARIN PROPER NAME + ENGLISH CATEGORY (e.g., “Taoyuan County”). So, at least in this post, I’m going to be a bit sloppy about what I’m calling “English.” Forgive me. OK, now back to the subject.

A couple of days ago, however, both major candidates for the powerful position of running the area currently known as Taipei County (Táiběi Xiàn) had a rare bit of agreement: both expressed a preference for using “New Taipei City” instead of “Xinbei City.” Ugh.

And to top things off, a couple dozen pro-Tongyong Pinyin protesters were outside Taipei County Hall the same day to protest against using Xinbei because it contains what they characterize as China’s demon letter X. Actually, that last part of hyperbole isn’t all that much of an exaggeration of their position. The X makes it look like the city is being crossed out, some of the protesters claimed.

This is, of course, stupid. But unfortunately it’s the sort of stupidity that sometimes plays well here, given how this is a country that pandered to the superstitious by removing 4′s from license plate numbers and ID cards and by changing the name of a subway line because if you cherry-picked from its syllables you could come up with a nickname that might remind people of a term for cheating in mah-jongg (májiàng). (Why bother with letting competent engineers do things the way they need to be done when problems can be fixed magically through attempts to eliminate puns!)

pro-Tongyong protesters hold up signs against using Hanyu Pinyin

The protesters would prefer the Tongyong form, Sinbei. I suspect foreigners here would rapidly change that to the English name “Sin City,” which I must admit would have a certain ring to it and might even be a tourist draw. Still, Tongyong has already done enough damage. Those wanting to promote Taiwan’s identity would be much better off channeling their energy into projects that might actually be useful to their cause.

The reason the government selected “Xinbei City” is that “New Taipei City” would be too similar to “Taipei City,” according to the head of the Taipei County Government’s Department of Civil Affairs. And, yes, they would be too similar. Also, Xinbei is simply the correct form in Hanyu Pinyin, which is Taiwan’s (and Taipei County’s) official romanization system. It would also be be much better still to omit “city” altogether.

Consider how this might work on signs, keeping in mind that Taipei and Xīnběi Shì are right next to each other. So such similar names as “New Taipei City” and “Taipei City” would run the risk of confusion, unlike, say, the case of New Jersey and Jersey. I wonder if the candidates for mayor of Xinbei are under the impression that they should change the name of the town across from Danshui from Bālǐ to something else because visitors to Taiwan might otherwise think they could drive to the Indonesian island of Bali from northern Taiwan.

They probably said they liked “New Taipei City” better because it sounds “more English” to them. And it is more English than “Xinbei.” But that’s not a good thing.

Once again it may be necessary to point out what ought to be obvious: The reason so-called English place names are needed is not because foreigners need places to have names in the English language. If it were, I suppose we could redub many places with appropriate names in real English: “Ugly Dump Filled With Concrete Buildings” (with numbers appended so the many possibilities could be distinguished from each other), “Nuclear Waste Depository,” “Armpit of Taiwan,” “Beautiful Little Town that Turns Into a Tourist Hell on Weekends,” etc. The possibilities are endless, though perhaps some of the nicer places would need to be given awful names — following the Iceland/Greenland model — lest they be overrun. The problem is that Chinese characters are too damn hard, and people who can’t read them (i.e., most foreign residents and tourists) need to be able to find places on maps, on Web pages, through signs, etc. And they need to be able to communicate through speech with people in Taiwan about places. Having two different names — the Mandarin one and the so-called English one — is just confusing. Having one name in Mandarin written in two systems (Chinese characters and romanization), however, makes sense and works best. (If Taiwan were to switch to using Taiwanese instead of Mandarin, that would be a whole ‘nother kettle of fish.)

But things that make sense and politicians don’t often fit well together.

Consider the signs. What a @#$% mess this could be. Let’s compare a few ramifications of using Xinbei and Taipei vs. using New Taipei City and Taipei City.

Xinbei and Taipei.

  • basically no chance of confusing one with the other
  • short (6 characters each), thus fitting better on signs
  • preexisting “Taipei [City]” signs wouldn’t have to be changed
  • Xinbei would be the correct romanization and not repeat the misleading pei of bastardized Wade-Giles
  • definitely no need to add “city” to either name, because there would be no “Taipei County” that might need to be distinguished from the city of Taipei, nor would there be a “Xinbei County” that would need to be distinguished from the city of Xinbei

Now let’s look at the case of New Taipei City and Taipei City.

  • relatively easy to confuse at a glance
  • relatively easy to confuse in general
  • long, and don’t fit as easily on signs (“New Taipei City” = 15 characters, including spaces; “Taipei City” = 11 characters, including the space)
  • “New Taipei City” would continue to ill-advised and outdated practice of using bastardized Wade-Giles spellings
  • any time the common adjective new needs to be applied to something dealing with “New Taipei City” or “Taipei City” the chances for confusion and mistakes would increase even more, esp. in headlines
  • the worst choice

The Taipei County Council will determine the final version of the name in September.

sources:

See also

(By the way, if any Taiwan reporters want to pick up on this blog post, please don’t just follow the usual practice here of simply asking one or two random foreigners if they think the name “New Taipei City” sounds OK, so then you conclude that there’s no problem. Try to get people who’ve actually thought about the situation for more than a few seconds and who could give you an informed opinion. My apologies to those reporters who of course know better.)

sg domain names in Chinese characters lag

Between November, 23, 2009, when Singapore first began registering .sg names in Chinese characters, and June 10, 2010, when registrations of Chinese-character .sg domain names opened to all without any additional fee, only 1,024 such names were registered, or just 0.88 percent of all .sg domain names. This apparently includes not just second-level domains (e.g., 中心.sg) but also third-level domains (e.g., 中心.com.sg).

The percentage will likely rise in the coming months, as the process has only recently opened to everyone on a first-come, first-served basis. But, still, demand for such names in Singapore has so far been underwhelming.

A bit more information:

Registrations were accepted in phases, with registrations for government organizations starting on Nov. 23, 2009. Beginning in January, SGNIC began accepting domain name registrations from trademark holders.

During the third phase, the general public was allowed to register domain names starting on March 25, but applicants were charged a “priority fee” of S$100 (US$72) for each domain name, with domain names sought by several applicants awarded to the highest bidder.

In all three phases, applicants could apply for a domain name made up of Chinese numbers or a name with just one Chinese character for a fee of S$500 [US$360]….

The fourth and final phase began on June 10, with SGNIC accepting domain name applications on a first-come, first-served basis. The S$100 priority fee is no longer required, but applicants are no longer allowed to register domain names using Chinese numbers or names with just one Chinese character….

When IDA announced the introduction of Chinese-language domain names last year, SGNIC said the effort was partly intended to help Singaporean businesses target the Chinese market.

source: Singapore registers 1,000 Chinese-language domain names, IDG News Service, June 23, 2010

Baidu adds handwriting input

Baidu has just added a function that allows people to use their mouse to write Chinese characters for searches.

On the Baidu home page, click on “手写” (shǒuxiě/手寫/handwrite).

This will bring up a pop-up box in which you can use your mouse to write Hanzi. This functions in basically the same way as the mouse-writing tool that Nciku added about two years ago.

source: Baidu.com’s Search Box Now Supports Chinese Handwriting Input, China Tech News, June 16, 2010

OMG, it’s Hanzified English

Taiwanese movie poster in Mandarin for 'Date Night', a.k.a. '約會喔麥尬'In Taiwan, the new movie Date Night has been given the Mandarin title Yuēhuì o mài gà (約會喔麥尬/约会喔麦尬).

Yuēhuì is simply the word for “date.” The interesting part is “o mài gà” (喔麥尬), which is a Mandarinized form of the English “oh my god.” (I wonder if this, being written in Hanzi despite still being basically English, would pass China’s new need for supposed purity.)

Most people here — especially those younger than about 40 — would simply write “oh my god” (or, less frequently, “o my god”) in English in the middle of an otherwise Mandarin text. (I’ll spare everyone the chart of Google searches; but it backs this up.) But brevity is standard in movie titles here, and “喔麥尬” is a lot more compact on a movie poster than “oh my god.” This, however, raises the question of why “喔麥尬” instead of the equally concise “OMG”. I don’t know the answer to that. But the path of lettered words in Mandarin is certainly not without twists and turns.

Like most other uses of Hanzified English, the results are not entirely faithful to the original sounds.

Mandarin’s ou would be a closer phonetic fit than o for the English “oh”.
There’s Ōu (區/区), a surname. But most of the time this Chinese character is pronounced (being one of those many Chinese characters with multiple pronunciations), so that certainly wouldn’t work well. There’s ǒu, which has a more clearly phonetic Hanzi (嘔/呕), but which has to do with vomit (ǒutù/嘔吐/呕吐). Another possible choice would be ōu (歐/欧); but that is associated mainly with Europe and doesn’t get used much as a phonetic component in non-Europe-related loan words outside the word for ohm: ōumǔ (歐姆/欧姆).

Mài (the Mandarin word for wheat), unlike most other Mandarin morphemes pronounced mai (various tones), gets used phonetically in lots of various loan words, such as Màidāngláo (McDonald’s/麥當勞/麦当劳), Màijiā (Mecca/麥加/麦加), Dānmài (Denmark/丹麥/丹麦), and Kāmàilóng (Cameroon/喀麥隆/喀麦隆). So its use is to be expected, though semantically there’s no link. And mài is certainly a better fit for the English my than it is for the Mc of McDonald’s, the Mec of Mecca, the mark of Denmark, or the me of Cameroon.

For ga there’s not a lot of choice. 咖 is often seen in the phonetic loan gālí (curry). The biggest problem here is that the same 咖 is also used as in a different, common phonetic loan: kāfēi (coffee). There’s 嘎; but, like 尬, it’s not exactly a well-known character.

Anyway, I could go on for a long time listing various possibilities. But the main point is that Chinese characters just don’t do well at this sort of thing.

As for Pinyin, I suppose the orthography could get interesting: o mài gà, o màigà, omài gà, or omàigà. But a Pinyin orthography would probably simply encourage people to write this in the original: oh my god.

BTW, you may wish to try the following experiment. The in o mài gà is most often seen in writing the word gāngà (尷尬/尴尬), which means awkward/embarrassed. Ask native speakers of Mandarin to write gāngà in Hanzi for you by hand without using a dictionary, a computer, or any other form of assistance. I bet that most people — even those with university degrees — won’t be able to write this common, ordinary word correctly.

And for lagniappe, the character 尬 is also sometimes seen in written Taiwanese as the equivalent of Mandarin’s jiā (加/add). I spotted an example of this just the other day on a cafe sign (in the sense of “buy something and ga something else for a special price”) but didn’t have a camera with me.

Combining Pinyin and Chinese character subtitles

With any luck, this will be the last post for some time in my none too exciting but hopefully useful series on technical aspects of creating Pinyin subtitles.

Some people like to have Pinyin subtitles and Hanzi subtitles appear at the same time. Although I think that’s generally a bad idea (too much text to get through quickly that way, people would benefit from becoming accustomed to reading Pinyin texts as Pinyin texts, etc.), I’ll go ahead and offer instructions on how to make Pinyin subtitles appear above Chinese character subtitles.

These directions are for Microsoft Word, though other programs could be used instead.

Using Word, open copies of the two subtitle files you’d like to combine.

To get the alignment between the two files to match when they’re combined, it’s important that each subtitle entry is only one line long. You can check for possible instances of multi-line subtitles with a wildcard search (CTRL+H –> More –> Use wildcards).

Find what (with “Use wildcards” checked):
([!0-9])^13([!0-9^13])

If that search finds any multi-line subtitles, you’ll need to temporarily adjust those lines in both subtitle files, as follows:

Find what (with “Use wildcards” checked):
([!0-9])^13([!0-9^13])

Replace with:
\1|\2

Again, be sure to run that search-and-replace in both subtitle files. You’ll replace the “|” with a RETURN later.

Next, in the file with the Chinese characters (not the Pinyin file) strip out everything except for the text of the subtitles, leaving just the Hanzi text. (I wrote about this earlier in How to strip subtitle files down to text. The method is also useful for removing such information if you want to create the text of the screenplay.)

Find what (with “Use wildcards” checked):
^13[0-9:\,\-\> ]{1,}^13

Replace with:
^p

Note: You may need to run the above “replace all” twice for Word to catch everything.

You should have something that looks like this (with paragraph marks shown):


喲! 李爺來啦¶

李爺來啦¶

秀蓮¶

秀蓮¶

秀蓮,李慕白來啦¶

Now add extra lines, so the lines with Chinese characters will fit into the new document in the correct places.

Find what (with “Use wildcards” checked):
^13^13

Replace with:
^p^p^p^p^p

Delete the very first line — the one with the “1″ in it. Then add three blank lines above this.

You should have something that looks like this (with paragraph marks shown):




喲! 李爺來啦¶




李爺來啦¶




秀蓮¶

Select all (CTRL+A). Then convert this to a table:
Table –> Convert –> Text to Table

Now switch to the Pinyin subtitles file.

First, add the extra lines blank lines into which you will later insert the Chinese characters that correspond with the Pinyin.

Find what (with “Use wildcards” checked):
^13^13

Replace with:
^p^p^p

Convert the Pinyin subtitles to a table:
CTRL+A
Table –> Convert –> Text to Table

Switch back to the Chinese character file. Copy the table there and paste it to the right of the table with the Pinyin text.

You should have something that looks like this:

1  
00:00:49,000 –> 00:00:51,500  
Yō! Lǐ yé lái la  
  喲! 李爺來啦
   
2  
00:00:52,200 –> 00:00:53,600  
Lǐ yé lái la  
  李爺來啦
   
3  
00:01:06,900 –> 00:01:08,400  
Xiùlián  
  秀蓮
   
4  
00:01:09,000 –> 00:01:10,400  
Xiùlián  
  秀蓮

Next, change this back into text:
Table –> Convert –> Table to Text

Remove the tabs:
Find what:
^t

Replace with:
[leave blank]

If you combined any lines earlier, break them apart now:
Find what:
|

Replace with:
^p

Your document should now look like this:

1
00:00:49,000 –> 00:00:51,500
Yō! Lǐ yé lái la
喲! 李爺來啦

2
00:00:52,200 –> 00:00:53,600
Lǐ yé lái la
李爺來啦

3
00:01:06,900 –> 00:01:08,400
Xiùlián
秀蓮

4
00:01:09,000 –> 00:01:10,400
Xiùlián
秀蓮

Save the file as plain text (*.txt), not as a Word document (*.doc). Then later rename this to give it the correct file extension (probably *.srt).

See also: