separating Pinyin syllables: PHP code

A few weeks ago I had someone write to ask if I had a script that can divide Pinyin texts into their individual syllables. It so happens that I do have something that does just that. Since I sent out that bit of code, I might as well make it available to everyone (GNU GPL, and links back to Pinyin.Info are always appreciated).

It has lots of regular expressions, to make the code nice and compact. I’ve added comments for clarity.

##############################
### SEPARATE THE SYLLABLES
##############################
// In the lines below, \s means space
// This program assumes that ü is written as v
// The i at the end of a line means case insensitive
// \W is a single, non-word character (e.g., punctuation)

$search = array ("'([aeiouv])([^aeiounr\W\s])'i", // This line does most of the work
"'(\w)([csz]h)'i", // double-consonant initials
"'(n)([^aeiouvg\W\s])'i", // cleans up most n compounds
"'([aeiuov])([^aeiou\W\s])([aeiuov])'i", // assumes correct Pinyin (i.e., no missing apostrophes)
"'([aeiouv])(n)(g)([aeiouv])'i", // assumes correct Pinyin, i.e. changan = chan + gan
"'([gr])([^aeiou\W\s])'i", // fixes -ng and -r finals not followed by vowels
"'([^e\W\s])(r)'i", // r an initial, except in er
);

$replace = array ("\\1 \\2",
"\\1 \\2",
"\\1 \\2",
"\\1 \\2\\3",
"\\1\\2 \\3\\4",
"\\1 \\2",
"\\1 \\2",
);

$usertext = preg_replace($search, $replace, $document);

##############################

Since I’m always going on about the need for word parsing and not separating Pinyin into single syllables, some of you are probably wondering just why I of all people would have ever written such code. The answer is that it’s part of my Pinyin spell-checker, which is only a very basic utility in that it functions by checking for theoretically correct groups of syllables rather than real words (i.e., anything composed of correctly spelled groups of syllables, minus tone marks, will pass even if that word isn’t found in a dictionary).

Suggestions for improvements are always welcome.

Ovid Tzeng reiterates backing for Hanyu Pinyin

Earlier this week Ovid Tzeng, a former minister of education and current minister without portfolio, reaffirmed his support for Taiwan adoping Hanyu Pinyin and said that this is an important issue the government will need to deal with sooner or later.

Zēng Zhìlǎng Jiàoyùbù zhǎng rènnèi, jiānchí cǎiyòng Hànyǔ Pīnyīn, shì tā bèi huàn xiàlái de zhǔyīn zhīyī. Tā zuótiān réng bù gǎi qí zhì, qiángdiào guówài bùguǎn Zhōngwén jiàoxué huò xuéshù qīkān, hěn duō yǐjing gǎiyòng Hànyǔ Pīnyīn, Táiwān bùnéng shìruòwúdǔ, zhè suī fēi xīn zhèngfǔ zuì yōuxiān shīzhèng xiàngmù, dànshì yě lièwéi wèilái zhòngdà jiǎntǎo shìxiàng.

Most of the source article for this discusses poet and academic Zheng Chouyu’s backing for Hanyu Pinyin. He stresses his view that this is a practical matter, not a political one.

source: Zhèng Chóuyǔ jiànyì: Zhōngwén yìyīn kěyǐ cǎi Hànyǔ Pīnyīn (鄭愁予建議:中文譯音 可採漢語拼音), United Daily News, June 9, 2009

further reading: Hanyu Pinyin backer to return to Taiwan’s Cabinet, Pinyin News, April 29, 2008

Whither Taiwan’s English renamings?

Those working in the new administration of President Ma Ying-jeou (Mǎ Yīngjiǔ) are people with priorities. For example, they certainly didn’t waste any time removing the Chinese characters for “Taiwan” from the Web site of the presidential office, as this happened on his first day in office. On the other hand, they didn’t bother with other things, like having the current year be 2008 instead of “108.”

From a screen shot taken a couple of nights ago:
screenshot from the website of the Office of the President, showing that the date script *still* hasn't been fixed (with the year given as '108' instead of '2008')

From a screen shot taken about two-and-a-half years ago:
screenshot from the website of the Office of the President, showing that the date as '106-01-02' for January 2, 2006

(FWIW, I told a meeting of government webmasters three years ago that the date script needed fixing — or, better still, deletion. Are they really under the impression that lots of people visit the presidential office’s Web site or that of any other Taiwan governmental agency to check the date and time?)

Also, given what the head of the ruling party recently said in the glorious motherland China, perhaps they might want to replace “Office of the President” with “Office of Mr. Ma.”

At any rate, how things are named is a concern of the current administration, just as it was for the previous one. I’ve given up trying to follow the twists and turns of the name of Revere the Bloody Dictator Shrine Chiang Kai-shek Memorial Hall Taiwan Democracy Memorial Hall. Someone let me know when the dust finally settles.

And then there’s the airport. The last time I was on a highway in Taoyuan I noticed that the signs that previously said “CKS Airport” had the “CKS” covered, so they read simply “Airport”. Maybe the new administration can live with that, regardless of what it does about the signage of the airport itself.

But what is to become of the official names that weren’t changed in Mandarin but only in English? Please note that I’m not talking about romanizations but about real English names. I’m referring to how the English names of several ministries and other government agencies were changed during President Chen Shui-bian’s two terms in office, though the Mandarin names remained the same.

For example:

Mandarin Name English Name
Pre DPP Current (March 2008)
Yuánzhùmín Wěiyuánhuì Council of Aboriginal Affairs Council of Indigenous Peoples
Guóyǔhuì Mandarin Promotion Council National Languages Committee
Zhōnghuá Mínguó Duìwài Màoyì Fāzhǎn Xiéhuì China External Trade Development Council (CETRA) Taiwan External Trade Development Council (TAITRA)
Qiáowù Wěiyuánhuì Overseas Chinese Affairs Commission Overseas Compatriot Affairs Commission

None of the above revised names have been revoked or changed as of today (June 12, 2008 — or 108-06-12, as the Presidential Office would have it).

What about the addresses of the Web sites of these ministries and agencies?

name URL comments
Council of Indigenous Peoples www.apc.gov.tw APC? According to someone I spoke with at the council, this stands for “Aboriginal People’s Commission” (or maybe “Aboriginal Peoples’ Commission”), a name that dates back to 1996. But I can’t find any search results for that name within .tw domains. Also, neither www.cip.gov.tw nor www.cip.gov.tw leads to anything. But lately the APC site has often been unresponsive. I mentioned to the council that they might want an updated URL; the person I spoke with said she’d look into it.
National Languages Committee www.edu.tw/MANDR/ This is under the Ministry of Education, which has changed the URL a few times over the years but has yet to revise the focus in the address on Mandarin (i.e., “MANDR”). Not even under the DPP was this address subject to rectification (zhèngmíng, 正名 ).
Taiwan External Trade Development Council (TAITRA) www.taitra.org.tw The old URL of www.cetra.org.tw leads to nothing, not even a redirect. www.taitra.com.tw mirrors the .org.tw address. This doesn’t have a .gov.tw address because it’s a semi-governmental organization.
Overseas Compatriot Affairs Commission www.ocac.gov.tw “Overseas Chinese Affairs Commission” and “Overseas Compatriot Affairs Commission” share the same abbreviation. One URL fits all.

Thus, so far the new English names have survived.

early Chinese tattoos

As my friend Tian of Hanzi Smatter continues to document, some people, Westerners especially, remain keen on having themselves tattooed with Chinese characters — even if they can’t read them. I doubt, though, that many are aware of China’s historical traditions in tattooing. As Carrie E. Reed notes in Early Chinese Tattoo (2.9 MB PDF), which is the latest reissue from Sino-Platonic Papers, “it appears that the practice of tattoo (other than the penal use) never achieved any level of general acceptance or widespread use among most parts of ancient Chinese society of any era.”

Yes, penal use: In early China tattooing was a common way of branding criminals. Often such tattoos were standard designs, such as circles. But sometimes they contained text.

Here’s something from Reed’s discussion of the Yuan dynasty’s legal code:

In the section on illicit sexual relationships we read that, in general, on the first offense the adulterous couple will be separated, but if they are “caught in the act” a second time, the man (it is not clear if the woman is tattooed as well) will be tattooed on the face with the words “committed licentious acts two times” (犯姦二度) and banished. Numerous examples are given to illustrate this type of punishment.

Reed examines and translates many texts describing tattoos.

Some of the terms encountered in these early texts are (with a literal translation given in parentheses) qing 黥 (to brand, tattoo), mo 墨 (to ink), ci qing 刺青 (to pierce [and make] blue-green), wen shen 文身 (to pattern the body), diao qing 雕青 (to carve and [make] blue-green), ju yan 沮顏 (to injure the countenence), wen mian 文面 (to pattern the face), li mian 剺面 (to cut the face) , hua mian 畫面 (to mark the face), lou shen 鏤身 (to engrave the body), lou ti 鏤體 (same), xiu mian 繡面 (to embroider [or ornament] the face), ke nie 刻涅 (to cut [and] blacken), nie zi 涅字 (to blacken characters) ci zi 刺字 (to pierce characters), and so on. These terms are sometimes used together, and there are numerous further variations. In general, if the tattooing of characters (字) appears in the term, it refers to punishment, but this is certainly not true in every case. Likewise, if a term literally meaning “to ornament” or “decorate” is used, it does not necessarily mean that the tattoo was done voluntarily or for decorative purposes.

All of the types of tattoo, except perhaps for the figurative and textual, are usually described as inherently opprobrious; people bearing them are stigmatized as impure, defiled, shameful or uncivilized. There does not ever seem to have been a widespread acceptance of tattoo of any type by the “mainstream” society; this was inevitable, partly due to the early and long-lasting association of body marking with peoples perceived as barbaric, or with punishment and the inevitable subsequent ostracism from the society of law-abiding people. Another reason, of course, is the Confucian belief that the body of a filial person is meant to be maintained as it was given to one by one’s parents.

This was first published in June 2000 as issue no. 103 of Sino-Platonic Papers. Although the work contains no illustrations, it does feature copious translations of texts describing tattoos or relating tales about them.

Gaoxiong street signs

Sinle StDuring an extremely brief trip a few weeks ago to Gāoxióng, Taiwan’s second-largest city, I was able to grab a few photos of signage there. Most of these were taken from a moving taxi; thus the poor quality and lack of much diversity. But these are the best I could do under the circumstances.

First, a few basic points:

  • they’re in Tongyong Pinyin (bleah — but at least they’re consistent)
  • they don’t use InTerCaPiTaLiZaTion (This lack is, of course, a good thing. If only Taipei hadn’t screwed this up!)
  • in most cases the text in romanization is large enough to read even at a distance (Very good — unlike all too many relatively recent signs elsewhere, such as Taipei County.)

In short, other than the choice of romanization most of these signs aren’t all that bad. They’re certainly much better (and more consistent) than the ones that Taipei County put up in Tongyong Pinyin a few years ago. (Although Taipei County’s current magistrate said more than two years ago that he was in favor of switching to Hanyu Pinyin, as far as I can see he has done absolutely nothing about this. Of course, some might say that he’s done absolutely nothing about anything; but I’ll leave discussion of that to the political blogs.)

Here’s another Gāoxióng sign with romanization that isn’t too small.
Dacheng St.

I’m not a fan of the practice of force-justifying the Chinese characters and romanization/English to the same width. This style can be seen in many of these signs. Sometimes this results in the romanized/English words being spaced too far apart; more often, though, the Chinese characters are left with lots of space between them — so much space that it would be easy to have spaces indicate word divisions for the texts in Hanzi (something Y.R. Chao recommended nearly a century ago), which might be an interesting thing to try on signs. I wonder if anyone has ever performed any experiments on this.

The full Mandarin name of the school indicated by the blue sign on the left is rather long:

Gāoxióng shìlì Gāoxióng nǚzǐ gāojí zhōngxué
(高雄市立高雄女子高級中學)

Whoever made the sign wisely desided to cut that down to 高雄女中 (Gāoxióng nǚ zhōng). If only someone had realized that it would have been better to use something shorter than the full English name, too. “Kaohsiung Municipal Girls’ Senior High School” is a lot to fit on one small sign. “Kaohsiung Girls’ High School”, “Girls’ Municipal High School”, or something even shorter would have been much better.

Here are some more signs.

And finally an address plate on a building. This style could certainly be better.
Dayi St.

Book reviews, vol. 6

Sino-Platonic Papers has rereleased for free its sixth volume of reviews, mainly of books about China and its history and languages (5.6 MB PDF).

The reviews are by David Utz, Xinru Liu, Taylor Carman, Bryan Van Nordan, and Victor H. Mair.

Contents

  • Review Article by David A. Utz of Ádám Molnár, Weather-Magic in Inner Asia. With an Appendix, “Alttürkische fragmente über den Regenstein,” by P. Zieme. Indiana University Uralic and Altaic Series, 158. Bloomington, Indiana: Indiana University, 1994.
  • Graham Parkes, ed., Heidegger and Asian Thought. Honolulu: University of Hawaii Press, 1987. Reviewed by Taylor Carman and Bryan Van Norden.
  • Beijing Daxue Nanya Yanjiusuo [Peking University Institute for South Asian Studies], ed. Zhongguo zaiji zhong Nanya shiliao huibian (Collection of South Asian Historical Materials from Chinese Sources). 2 vols. Shanghai: Shanghai Guji Chubanshe, 1995. Reviewed by Xinru Liu.

The following 23 reviews are by the editor of Sino-Platonic Papers.

  • Ronald E. Emmerick and Edwin G. Pulleyblank. A Chinese Text in Central Asian Brahmi Script: New Evidence for the Pronunciation of Late Middle Chinese and Khotanese. Serie Orientale Roma, LXIX. Rome: lstituto ltaliano per ii Medio ed Estremo Oriente, 1993.
  • YIN Binyong and SU Peicheng, eds. Kexuede pingjia Hanyu hanzi [Scientifically Appraise Sinitic and Sinographs]. Zhongguo yuwen xiandaihua congshu (Chinese Language Modernization Series), 1. Peking: Huayu Jiaoxue Chubanshe (Sinolingua), 1994.
  • WU Chang’an. Wenzi de toushi — Hanzi lunheng [A Perspective on Culture — Balanced Discussions on the Sinographs]. Wenhua Yuyanxue Congshu [Cultural Linguistics Series]. N.p. (Changchun?): Jilin Jiaoyu Chubanshe, 1995.
  • ZHOU Shilie, comp. Tongxingci cidian [Dictionary of Homographs]. Peking: Zhongguo Guoji Guangbo Chubanshe, 1995. (Reviewed twice from different perspectives in the same issue.)
  • KANG Yin. Wenzi Yuanliu Qianshi (The Origin and Development of Chinese Ideographs) (sic). N.p.: Guoji Wenhua Chubanshe, 1992.
  • DUAN Kailian. Zhongguo minjian fangyan cidian [A Dictionary of Chinese Folk Topolecticisms]. Haikou: Nanhai chuban gongsi, 1994.
  • CHANG Xizhen, comp. Beiping tuhua [Peking Colloquialisms]. Taipei: Shenge Shiye Youxian Gongsi Chubanshe, 1990.
  • ZHANG Xunru. Beiping yinxi xiaoche bian [A Compilation of Words with “er” Suffix in Pekingese]. Taipei: Taiwan Kaiming, 1991; 2nd Taiwan ed.; 1956, first Taiwan ed.
  • LI Sijing. Hanyu “er” [] yin shi yanjiu [Studies on the History of the “er” [] Sound in Sinitic]. Taipei: Taiwan Shangwu, 1994.
  • Erdengtai, Wuyundalai, and Asalatu. Menggu mishi cihui xuanshi [Selected Explanations of Lexical Items in The Secret History of the Mongols]. Mengguzu lishi congshu [Series on the History of the Mongolian People]. Hohhot: Neimenggu Renrnin Chubanshe, 1980; 1991 rpt.
  • Matthews, Stephen and Virginia Yip. Cantonese: A Comprehensive Grammar. Routledge Grammars. London and New York: Routledge, 1994.
  • Killingley, Siew-Yue. Cantonese. Languages of the World / Materials 06. München-Newcastle: Lincom Europa, 1993.
  • ZHONG Jingwen, chief ed. Yuhai (An Encyclopedia of Chinese Folk Language), Vol. 1: Mimiyu (Chinese Secret Language). Vol. editors ZHENG Shuoren and CHEN Qi. Shanghai: Shanghai Wenyi Chubanshe, 1994.
  • Harrell, Stevan, ed. Cultural Encounters on China’s Ethnic Frontiers. Seattle and London: University of Washington Press, 1995.
  • Woo, Henry K. H. The Making of a New Chinese Mind: Intellectuality and the Future of China. Hong Kong: China Foundation, 1993.
  • Miller, Lucien, ed. South of the Clouds: Tales from Yunnan. Translated by GUO Xu, Lucien Miller, and XU Kun. Seattle and London: University of Washington Press, 1994.
  • Hoizey, Dominique and Marie-Joseph Hoizey. A History of Chinese Medicine. Tr. by Paul Bailey. Vancouver: UBC Press; Edinburgh: Edinburgh University Press, 1993.
  • Crystal, David. An Encyclopedic Dictionary of Language and Languages. London: Penguin, 1992, 1994.
  • Day, Gordon M. Western Abenaki Dictionary. Vol. 1: Abenaki-English. Vol. 2: English-Abenaki. Mercury Series, Canadian Ethnology Service, Papers 128 and 129. Hull, Quebec: Canadian Museum of Civilization, 1994-95.
  • Hassrick, Peter H. The Frederic Remington Studio. Cody, Wyoming: Buffalo Bill Historical Center, in association with University of Washington Press (Seattle, London), 1994.
  • Jonaitis, Aldona, ed. Chiefly Feasts: The Enduring Kwakiutl Potlatch. Seattle and London: University of Washington Press; New York: American Museum of Natural History, 1991.
  • Jerry L. Norman and W. South Coblin. “A New Approach to Chinese Historical Linguistics.” Journal of the American Oriental Society, 115.4 (1995),576-584.

Bits and Pieces

  • Letter concerning An Zhimin’s views on the origins of bronze metallurgy in China.
  • “Yet again on Tibet.” This is one in a continuing series of discussions with Edwin G. Pulleyblank, W. South Coblin, and others on the origins of the name “Tibet”.

This was first published in February 1996 as issue no. 70 of Sino-Platonic Papers.

Burger King, romanization, and a Taiwanese morpheme

Last week I was at my neighborhood shopping mall and saw an interesting ad in the Burger King there. Toward the bottom we find the following:
ad text reading ???, A??

買套餐, A大獎
(Mǎi tàocān, A dà jiǎng)
(Buy a set meal, score a big prize.)

Here, the Roman letter “A” is used to represent a Taiwanese verb that means something like “get in an easy manner” or “make off with” — though the fine print says that customers just have a chance to get a prize, not that they necessarily will win one.

A is often used in A-qián (”A錢”: to A money), a mixed Taiwanese and Mandarin term that means embezzle/embezzlement.

Perhaps the Ministry of Education has issued an official Chinese character for this morpheme. But even if they had most people would have no idea how to read it, and it probably would be of spurious origin to boot — just like most of the other characters the ministry has issued. Where a Taiwanese morpheme sounds like the English name of a Roman letter, the romanized form is likely to prevail over the Chinese character.

There are other interesting things about this ad. But I’ll get to those in another post.

early Chinese astrology: SPP

In 1995 a joint Sino-Japanese archaeological expedition excavated a Niya burial ground and found a bowman’s armband in the tomb of a “beautifully dressed Europoid couple” (i.e., definitely not Han). Although it’s nearly two thousand years old, it’s remarkably well preserved, even in its colors.

detail of the brocade, showing the Chinese characters discussed in the post

The text (right to left) reads “wǔxīng chū dōngfāng lì Zhōngguó” (五星出東方利中國 / 五星出东方利中国 ) (“when the five planets appear in the east it is beneficial for China”).

As David W. Pankenier — the author of Popular Astrology and Border Affairs in Early China: An Archaeological Confirmation (2.3 MB PDF), the latest rerelease from Sino-Platonic Papers — notes, “One could hardly ask for more eloquent testimony to the pervasiveness of astrological thinking in early China than this accessory from one of the remotest frontiers of the empire.” (See his paper for all sorts of details.)

As I suppose befits something on the subject of astrology, some superstitious people in China latched onto the phrase as a prophecy of the greatness of the People’s Republic of China (whose flag has five stars). The text, however, doesn’t refer to wǔ [kē] xīng (“five stars”) but to the wǔxīng (“the five planets”), which people these days might call the wǔ dà xīngxing (“five greater stars”). But superstitious nationalists aren’t known for letting facts get in the way of what they want to believe.

The five planets are:

  • 火星 Huǒxīng Mars (lit. “fire star”)
  • 水星 Shuǐxīng Mercury (“water star”)
  • 木星 Mùxīng Jupiter (“wood star”)
  • 金星 Jīnxīng Venus (“metal star”)
  • 土星 Tǔxīng Saturn (“earth star”)

Those in beginning Mandarin classes are no doubt grateful that for days of the week modern standard Mandarin has adopted what is mainly a numbering system (i.e., lǐbài, lǐbài’èr, lǐbàisān… — day of the week no. 1, day of the week no. 2, day of the week no. 3 …) rather than the old names, which use the names of the planets (along with the sun and moon). Students of Japanese aren’t so lucky.

Or maybe I’ve got that backward; many who study languages that use Chinese characters as a script have more than a bit of masochism.