I’ve had a spate of requests recently for the code for Pinyin.info’s tool that converts Chinese characters to Unicode numeric character references (i.e., something that converts, say, “漢語拼音” into “漢語拼音”). Since I’m a believer in open-source work — and since people could find the code anyway if they look carefully enough in the Web page’s source code — I might as well publish it.
This tool can be very handy when making Web pages that use a variety of scripts. (It works on Cyrillic, etc., as well.) I often employ it myself.
Here’s the heart of the code:
function convertToEntities() {
var tstr = document.form.unicode.value;
var bstr = '';
for(i=0; i<tstr.length; i++)
{
if(tstr.charCodeAt(i)>127)
{
bstr += '' + tstr.charCodeAt(i) + ';';
}
else
{
bstr += tstr.charAt(i);
}
}
document.form.entity.value = bstr;
}
This sleek little bit of Javascript is originally by Steve Minutillo and used here on Pinyin.info with his permission. I may have tweaked the code a little myself; but that was so long ago I don’t remember well. (I’ve had the converter here for about five years.) Anyway, if you use this please acknowledge Steve’s authorship; and of course I always greatly appreciate links back to Pinyin.info.
If anyone knows how to do the same thing in PHP — preferably with no more code than used above, please let me know.
See also: separating Pinyin syllables: PHP code.