convert Chinese characters to Unicode character references: javascript

I’ve had a spate of requests recently for the code for Pinyin.info’s tool that converts Chinese characters to Unicode numeric character references (i.e., something that converts, say, “漢語拼音” into “漢語拼音”). Since I’m a believer in open-source work — and since people could find the code anyway if they look carefully enough in the Web page’s source code — I might as well publish it.

This tool can be very handy when making Web pages that use a variety of scripts. (It works on Cyrillic, etc., as well.) I often employ it myself.

Here’s the heart of the code:


function convertToEntities() {
var tstr = document.form.unicode.value;
var bstr = '';
for(i=0; i<tstr.length; i++)
{
if(tstr.charCodeAt(i)>127)
{
bstr += '&#' + tstr.charCodeAt(i) + ';';
}
else
{
bstr += tstr.charAt(i);
}
}
document.form.entity.value = bstr;
}

This sleek little bit of Javascript is originally by Steve Minutillo and used here on Pinyin.info with his permission. I may have tweaked the code a little myself; but that was so long ago I don’t remember well. (I’ve had the converter here for about five years.) Anyway, if you use this please acknowledge Steve’s authorship; and of course I always greatly appreciate links back to Pinyin.info.

If anyone knows how to do the same thing in PHP — preferably with no more code than used above, please let me know.

See also: separating Pinyin syllables: PHP code.

Taiwan’s implementation of Hanyu Pinyin to be limited, gradual

The Ministry of Education’s National Languages Committee on Wednesday issued very general guidelines for how Taiwan will go about implementing Hanyu Pinyin.

Unfortunately, they’re not very clear. But long years of experience have taught me that the most pessimistic interpretation (from the standpoint of Pinyin advocates) is probably the correct one. One guideline, for example, states:

Guónèi dìmíng shǔ guójì tōngyòng huò yuēdìngsúchéng zhě, wúxū gēnggǎi.
(Dometic place names that are internationally known or established by convention need not change.)

That’s going to be the excuse used to justify keeping all too many names in bastardized Wade-Giles or other largely useless systems. Thus, we’re probably stuck with not just old forms of names of big cities and counties (e.g., Kaohsiung and Taichung rather than Gaoxiong and Taizhong) but also old forms of lesser-known cities and counties (e.g., Taitung and Keelung rather than Taidong and Jilong). If this is the extent of things, it would copy the policy that the previous administration applied, which I think would be a terrible mistake.

Taiwan’s romanization situation: plus ça change, plus c’est la même chose.

Of course, there’s also the possibility that this will be used an excuse to keep even more old forms than the DPP’s Tongyong policy did, e.g., Panchiao and Hsintien rather than Banqiao and Xindian (or Tongyong’s Banciao and Sindian). In which case the expression might better be, “Taiwan’s romanization situation: one step forward, two steps back.

sources:

Gaoxiong education chief backs city retaining Tongyong

The news on Taiwan’s romanization situation has been coming in fast over the past few days. Unfortunately I’ve been too busy to report much on this. But rest assured that I am trying to get some things done behind the scenes … for all the good that will do given Taiwan’s piss-poor record on this issue. Still, I’m trying to remain hopeful.

Last week the deputy chief of Gaoxiong’s (Kaohsiung’s) Bureau of Education said that he was in favor of the city adopting the international system for romanizing Mandarin, Hanyu Pinyin. But on Friday his boss, Cài Qīnghuá, slapped down that idea.

Cai said that almost no schools reported problems with Tongyong Pinyin. I have no idea what that has to do with anything. But that was part of his justification for backing Tongyong.

He also said it would cost too much money to change, throwing out a reportedly conservative estimate of NT$900 million (US$28 million), which I think is likely a gross overestimate.

Here’s the story:

Gāoxióng shìzhèngfǔ dàodǐ zhī bù zhīchí Hànyǔ Pīnyīn? Gāoxióng Shì Jiàoyùjú zhǎng Cài Qīnghuá zuótiān biǎoshì, quán shì yī sì wǔ suǒ huíbào xuéxiào zhōng, zhǐyǒu sì suǒ tíjí Tōngyòng Pīnyīn shǐyòng de wèntí, juédàduōshù xuéxiào bìngwú yìjian, Gāoxióng shìzhèngfǔ jiù “zhǔguǎn dānwèi zài yèwù tuīdòng shàng, shì-fǒu yǒu xūyào xiézhù shìxiàng” wèntí shí, huífù “pīnyīn zhèngcè xū yǔ guójì jiēguǐ, jiànyì cǎiyòng guójì jiān duōshù shǐyòng de pīnyīn xìtǒng Hànyǔ Pīnyīn.” Shì Jiàoyùjú zhǔ mì de yìjian, tā méi zhùyìdào.

Cài Qīnghuá shuō, mùqián háishi zhǔzhāng yányòng Tōngyòng Pīnyīn, fǒuzé gēnggǎi Gāoxióng Shì guāngshì lùbiāo, dìbiāo, biāozhì děng, bǎoshǒu gūjì jiù xū huāfei yīdiǎn jiǔyì yuán.

source: Gāoxióng Shì Jiàoyùjú zhǎng zhǔzhāng: yányòng Tōngyòng Pīnyīn (高市教育局長 主張沿用通用拼音), Zìyóu Shíbào (Liberty Times), September 20, 2008

detailed rules for Hanyu Pinyin: a major addition to Pinyin.Info

cover of Chinese Romanization: Pronunciation and OrthographyFor several years I’ve had online the brief official principles for writing Hanyu Pinyin. But those go only so far. Fortunately, Yin Binyong (Yǐn Bīnyōng / 尹斌庸) (1930-2003), who was involved in work on Hanyu Pinyin from the beginning, wrote two books on the subject, producing a detailed, logical, and effective orthography for Pinyin.

The only one of those two books with English explanations as well as Mandarin, Chinese Romanization: Pronunciation and Orthography (Mandarin title: Hànyǔ Pīnyīn hé Zhèngcífǎ / 汉语拼音和正词法 / 漢語拼音和正詞法), has gone out of print; and at present there are no plans to bring it back into print. Fortunately, however, I was eventually able to secure the rights to reproduce this work on Pinyin.Info. Yes, the entire book. So everybody be sure to say thank you to the generous publisher by buying Sinolingua’s books.

This book, which is nearly 600 pages long, is a mother lode of information. It would be difficult for me to overstate its importance. Over the next few months I’ll be releasing the work in sections. I had intended to delay this a little, as I have had to wait for a fancy new scanner and am still awaiting some OCR software that can handle Hanzi as well as the Roman alphabet. (This Web site is an expensive hobby!) But since Taiwan has recently adopted Hanyu Pinyin I will be releasing some material soon (without OCR, for the time being) in the hope of helping Taiwan avoid making mistakes in its implementation of an orthography for Pinyin here.

Watch this blog for updates.

John DeFrancis video

John DeFrancisTen years ago John DeFrancis was awarded the Chinese Language Teachers Association’s first lifetime achievement award. Since he could not be present at the association’s annual conference to receive the award, he sent a videotape of a 12-minute acceptance speech. The video was recently edited down to 6:27 and uploaded to YouTube: John DeFrancis remarks.

Here’s my summary of the main points:

0:00 — While working on what he intended to be a largely political study of Chinese nationalism, DeFrancis encountered references to people who wanted China to adopt an alphabetic writing system, an idea which he initially dismissed. But discovering Lu Xun’s interest in romanization led him to investigate the matter further. [I’m frustrated by the cut away from this discussion. Perhaps a fuller version of the video will be posted later.]
1:30 — Emphasizes he’s not in favor of completely abandoning Chinese characters. Rather, he favors digraphia.
2:30 — “I’d like to mention three aspects of the Chinese field which have interested me.”

  1. pedagogy (2:50) — lots of advancements
  2. linguistic aspect (3:20) — that’s also progressing well
  3. socio-linguistics (3:52) — the field isn’t doing as well as it should be

5:00 — computers and Chinese characters. DeFrancis tears into the Chinese government for its emphasis on shape-based character-input methods rather than Pinyin.

Park Street redux

As some of you may recall, last October I wrote about finding official signs for a Taipei street that used English rather than romanization (Street names in English translation: trend or error?).

Some of the signs for what is written in Hanzi “園區街” (Yuánqū Jiē) read, in Taipei’s standard but stupid InTerCaPiTaLiZaTion, “YuanQu St.” while others read “Park St.” (which, by the way, is a misleading translation). I called the Taipei City Government about this and was informed that Park was an error and that the signs would be fixed to read Yuanqu.

Nearly a year has gone by since then. Have any of the street signs been changed?

The answer is yes. The signs, including some new ones, are indeed consistent. All of them now read — have you guessed it yet? — “Park St.”

That’s right: They eliminated the signs that were correct and put up new signs that are wrong. I’m trying to relax, so I won’t write out all of the many maledictions I have been muttering about Taipei City Government and its bureaucracy.

Here’s one of the street signs in October 2007:
YuanQu St.

Here’s the same sign in August 2008:
Park St.

A close-up, showing how “Park” was pasted over “YuanQu”.
closeup of the sign, showing how 'Park' was pasted over 'YuanQu'