convert Chinese characters to Unicode character references: javascript

I’ve had a spate of requests recently for the code for Pinyin.info’s tool that converts Chinese characters to Unicode numeric character references (i.e., something that converts, say, “漢語拼音” into “漢語拼音”). Since I’m a believer in open-source work — and since people could find the code anyway if they look carefully enough in the Web page’s source code — I might as well publish it.

This tool can be very handy when making Web pages that use a variety of scripts. (It works on Cyrillic, etc., as well.) I often employ it myself.

Here’s the heart of the code:


function convertToEntities() {
var tstr = document.form.unicode.value;
var bstr = '';
for(i=0; i<tstr.length; i++)
{
if(tstr.charCodeAt(i)>127)
{
bstr += '&#' + tstr.charCodeAt(i) + ';';
}
else
{
bstr += tstr.charAt(i);
}
}
document.form.entity.value = bstr;
}

This sleek little bit of Javascript is originally by Steve Minutillo and used here on Pinyin.info with his permission. I may have tweaked the code a little myself; but that was so long ago I don’t remember well. (I’ve had the converter here for about five years.) Anyway, if you use this please acknowledge Steve’s authorship; and of course I always greatly appreciate links back to Pinyin.info.

If anyone knows how to do the same thing in PHP — preferably with no more code than used above, please let me know.

See also: separating Pinyin syllables: PHP code.

Taiwan’s implementation of Hanyu Pinyin to be limited, gradual

The Ministry of Education’s National Languages Committee on Wednesday issued very general guidelines for how Taiwan will go about implementing Hanyu Pinyin.

Unfortunately, they’re not very clear. But long years of experience have taught me that the most pessimistic interpretation (from the standpoint of Pinyin advocates) is probably the correct one. One guideline, for example, states:

Guónèi dìmíng shǔ guójì tōngyòng huò yuēdìngsúchéng zhě, wúxū gēnggǎi.
(Dometic place names that are internationally known or established by convention need not change.)

That’s going to be the excuse used to justify keeping all too many names in bastardized Wade-Giles or other largely useless systems. Thus, we’re probably stuck with not just old forms of names of big cities and counties (e.g., Kaohsiung and Taichung rather than Gaoxiong and Taizhong) but also old forms of lesser-known cities and counties (e.g., Taitung and Keelung rather than Taidong and Jilong). If this is the extent of things, it would copy the policy that the previous administration applied, which I think would be a terrible mistake.

Taiwan’s romanization situation: plus ça change, plus c’est la même chose.

Of course, there’s also the possibility that this will be used an excuse to keep even more old forms than the DPP’s Tongyong policy did, e.g., Panchiao and Hsintien rather than Banqiao and Xindian (or Tongyong’s Banciao and Sindian). In which case the expression might better be, “Taiwan’s romanization situation: one step forward, two steps back.

sources:

Hanyu Pinyin and common nouns: the rules

cover of Chinese Romanization: Pronunciation and OrthographyI’ve just added another long section of Yin Binyong’s book on the detailed rules for Hanyu Pinyin. This part (pp. 78-138) covers common nouns (2.4 MB PDF).

I should have mentioned earlier that this book isn’t useful just for those who want to know more about Pinyin. It can also serve as an excellent work for those learning Mandarin, since it tends to group like ideas together and gives many examples of how combinations form other words.

All that, and it’s absolutely free. So go ahead and download it now.

Here are the main divisions:

  1. Introduction
  2. Simple Nouns
  3. Nouns with Prefixes
  4. Nouns with Suffixes
  5. Reduplicated Nouns
  6. Nouns of Modifier-Modified Construction
  7. Nouns of Coordinate Construction
  8. Nouns of Verb-Object and Subject-Predicate Construction
  9. Locational Nouns
  10. Nouns of Time
  11. Noun Phrases that Express a Single Concept

Gaoxiong education chief backs city retaining Tongyong

The news on Taiwan’s romanization situation has been coming in fast over the past few days. Unfortunately I’ve been too busy to report much on this. But rest assured that I am trying to get some things done behind the scenes … for all the good that will do given Taiwan’s piss-poor record on this issue. Still, I’m trying to remain hopeful.

Last week the deputy chief of Gaoxiong’s (Kaohsiung’s) Bureau of Education said that he was in favor of the city adopting the international system for romanizing Mandarin, Hanyu Pinyin. But on Friday his boss, Cài Qīnghuá, slapped down that idea.

Cai said that almost no schools reported problems with Tongyong Pinyin. I have no idea what that has to do with anything. But that was part of his justification for backing Tongyong.

He also said it would cost too much money to change, throwing out a reportedly conservative estimate of NT$900 million (US$28 million), which I think is likely a gross overestimate.

Here’s the story:

Gāoxióng shìzhèngfǔ dàodǐ zhī bù zhīchí Hànyǔ Pīnyīn? Gāoxióng Shì Jiàoyùjú zhǎng Cài Qīnghuá zuótiān biǎoshì, quán shì yī sì wǔ suǒ huíbào xuéxiào zhōng, zhǐyǒu sì suǒ tíjí Tōngyòng Pīnyīn shǐyòng de wèntí, juédàduōshù xuéxiào bìngwú yìjian, Gāoxióng shìzhèngfǔ jiù “zhǔguǎn dānwèi zài yèwù tuīdòng shàng, shì-fǒu yǒu xūyào xiézhù shìxiàng” wèntí shí, huífù “pīnyīn zhèngcè xū yǔ guójì jiēguǐ, jiànyì cǎiyòng guójì jiān duōshù shǐyòng de pīnyīn xìtǒng Hànyǔ Pīnyīn.” Shì Jiàoyùjú zhǔ mì de yìjian, tā méi zhùyìdào.

Cài Qīnghuá shuō, mùqián háishi zhǔzhāng yányòng Tōngyòng Pīnyīn, fǒuzé gēnggǎi Gāoxióng Shì guāngshì lùbiāo, dìbiāo, biāozhì děng, bǎoshǒu gūjì jiù xū huāfei yīdiǎn jiǔyì yuán.

source: Gāoxióng Shì Jiàoyùjú zhǎng zhǔzhāng: yányòng Tōngyòng Pīnyīn (高市教育局長 主張沿用通用拼音), Zìyóu Shíbào (Liberty Times), September 20, 2008

Hanyu Pinyin and proper nouns

cover of Chinese Romanization: Pronunciation and OrthographyThe first large section from Chinese Romanization: Pronunciation and Orthography to go online is the one on proper nouns (2 MB PDF).

  1. Introduction
  2. Place Names
  3. Personal Names
    1. formal names
    2. non-formal names
    3. forms of address
  4. Transliteration of Foreign Place Names and Personal Names
  5. Other Proper Nouns
    1. names of nationalities
    2. names of religions and deities
    3. names of dynasties
    4. names of festivals and holidays
    5. names of celestial bodies
    6. names of languages
    7. titles of literary and artistic works
    8. titles of newspapers and magazines
    9. names of social units
    10. trademarks
  6. Proper Nouns in Combination with Common Nouns

Thus, these rules cover many of the applications of Pinyin that appear on signage.

I’ll post a version with OCR later (probably weeks or months rather than days). In the meanwhile, you can use the bookmarks within the PDF file to navigate the document.

further reading:

detailed rules for Hanyu Pinyin: a major addition to Pinyin.Info

cover of Chinese Romanization: Pronunciation and OrthographyFor several years I’ve had online the brief official principles for writing Hanyu Pinyin. But those go only so far. Fortunately, Yin Binyong (Yǐn Bīnyōng / 尹斌庸) (1930-2003), who was involved in work on Hanyu Pinyin from the beginning, wrote two books on the subject, producing a detailed, logical, and effective orthography for Pinyin.

The only one of those two books with English explanations as well as Mandarin, Chinese Romanization: Pronunciation and Orthography (Mandarin title: Hànyǔ Pīnyīn hé Zhèngcífǎ / 汉语拼音和正词法 / 漢語拼音和正詞法), has gone out of print; and at present there are no plans to bring it back into print. Fortunately, however, I was eventually able to secure the rights to reproduce this work on Pinyin.Info. Yes, the entire book. So everybody be sure to say thank you to the generous publisher by buying Sinolingua’s books.

This book, which is nearly 600 pages long, is a mother lode of information. It would be difficult for me to overstate its importance. Over the next few months I’ll be releasing the work in sections. I had intended to delay this a little, as I have had to wait for a fancy new scanner and am still awaiting some OCR software that can handle Hanzi as well as the Roman alphabet. (This Web site is an expensive hobby!) But since Taiwan has recently adopted Hanyu Pinyin I will be releasing some material soon (without OCR, for the time being) in the hope of helping Taiwan avoid making mistakes in its implementation of an orthography for Pinyin here.

Watch this blog for updates.

The Art of War: a companion volume

Sonshi, the largest website dedicated to Sun Zi’s (Sun Tzu’s) Art of War, recently selected Victor H. Mair’s new translation as “the #1 Art of War edition.”

In announcing its judgment, the site stated, “how rare a book that courageously stands up to centuries of established thought, proceeds to knock it down with sound logic and proof, and succeeds in convincing even the Old Guard to change their views.”

Professor Mair has just published a free, book-length companion to his translation: Soldierly Methods: Vade Mecum for an Iconoclastic Translation of Sun Zi bingfa, with a complete transcription and word-for-word glosses of the Manchu translation by H. T. Toh (1 MB PDF).

Yes, all that and Manchu too. The appendixes might well supply the longest text in romanized Manchu available online — not to mention the longest one with English translation. (Perhaps someone from Echoes of Manchu can comment.)

And I’d like to note the introduction to the transcription offers a cool word I hadn’t come across before: Mandjurist, which is German for “Manchu philologist.”

Here’s the table of contents:

  • Preface
  • Principles of Translation
  • Guide to Pronunciation
  • Key Terms
  • Abbreviations
  • Discussion
    • The Book and Its Title
    • Authorship
    • Historical Background
    • Dating
    • Stylistics and Statistics
    • Techniques and Technology
    • Taoistic Aspects
    • Eurasian Parallels
    • On the World Stage
    • Notes
  • Appendix I: The Pseudo-Biography of Sun Wu
  • Appendix II: Further Notes on Selected Key Terms
  • Appendix III: Transcription of the Manchu Translation of the Sun Zi with Word-for-Word English Glosses by Hoong Teik Toh
  • Appendix IV: Transcription of the Manchu Translation of the Sun Zi by Hoong Teik Toh
  • Bibliography

This is issue no. 178 of Sino-Platonic Papers.

John DeFrancis video

John DeFrancisTen years ago John DeFrancis was awarded the Chinese Language Teachers Association’s first lifetime achievement award. Since he could not be present at the association’s annual conference to receive the award, he sent a videotape of a 12-minute acceptance speech. The video was recently edited down to 6:27 and uploaded to YouTube: John DeFrancis remarks.

Here’s my summary of the main points:

0:00 — While working on what he intended to be a largely political study of Chinese nationalism, DeFrancis encountered references to people who wanted China to adopt an alphabetic writing system, an idea which he initially dismissed. But discovering Lu Xun’s interest in romanization led him to investigate the matter further. [I’m frustrated by the cut away from this discussion. Perhaps a fuller version of the video will be posted later.]
1:30 — Emphasizes he’s not in favor of completely abandoning Chinese characters. Rather, he favors digraphia.
2:30 — “I’d like to mention three aspects of the Chinese field which have interested me.”

  1. pedagogy (2:50) — lots of advancements
  2. linguistic aspect (3:20) — that’s also progressing well
  3. socio-linguistics (3:52) — the field isn’t doing as well as it should be

5:00 — computers and Chinese characters. DeFrancis tears into the Chinese government for its emphasis on shape-based character-input methods rather than Pinyin.