URLs, Chinese characters, and the Roman alphabet

In Will China Build a Separate Internet? John Yunker, citing Naseem Javed’s When Will The Internet Be Divided Among Nations?, states, “Naseem does raise a very important point — for Chinese speakers, the Internet is far from user-friendly. The major obstacle is the URL, which is still limited to ASCII (Latin) characters.”

I don’t see where Naseem Javed made that particular point — but no matter. I just want to note that URLs in ASCII do not present an obstacle to Internet users in China. After all, the Roman alphabet (specifically, Pinyin) is what most people use to enter Chinese characters on computers in the first place. And even those in China who don’t use Pinyin to input Chinese characters are perfectly capable of using their, yes, QWERTY keyboards to type the ASCII in URLs, the Roman alphabet having been taught for decades to every schoolchild in China (at least to those now literate enough to use the Internet in the first place).

On the other hand, having to enter Chinese-character URLs would be an obstacle to most of the world’s population.

Those looking to argue that ASCII URLs could be an obstacle would do better to look to Russia, Greece, or Saudi Arabia.

The folks at ICANN and IETF are working to upgrade the DNS to Unicode, but this will take time. There is a workaround in use that allows Web users to input Chinese characters as a URL which is then transformed into ASCII characters behind the scenes (known as “Punycode”) but I’m not sure how widely used this system currently is.

IE7 is supposed to have good support for Punycode. Now if only IE would finally get CSS right….

Here’s an example of Punycode: ?? is xn--muuy29i, according to an open-source Punycode converter. Thus, http://??.pinyin.info and http://xn--muuy29i.pinyin.info should both lead to the same page. And I would hope that the address bar in the browser would read http://??.pinyin.info instead of the xn--muuy29i ASCII version.

If you add a comment on how well the Punycode tests work for you, please mention your computer’s operating system and browser. (I’m using Win2K and Opera 8.51, and both http://??.pinyin.info and http://xn--muuy29i.pinyin.info work fine.)

4 thoughts on “URLs, Chinese characters, and the Roman alphabet

  1. The Chinese government has already (about a year ago) built a separate domain name system to the rest of the world – see http://www.icannwatch.org/article.pl?sid=05/04/27/1512243

    ASCII domain names may not be a huge problem – but it does create an unnecessary hurdle for Chinese users. Give a company the option of having their Chinese characters name or their pinyin name, and I bet they’d all say Chinese characters in preference (and also register the pinyin version – but of course you’ve got more name clashes there esp. if you’re not putting in tones)

    Of course it’s worse in Taiwan where noone has the faintest clue as to what would be a ‘correct’ romanization – a couple of days ago i heard a radio advert where most of the time was spent spelling out the URL letter by letter excruciatingly slowly.

    Incidentally, punycode works fine in Firefox (1.5, and I believe much earlier versions). BTW, your link to the John Yunker article seems to be wrong (trackback ping url?)

  2. Thanks for your reply and for bringing up the Chinese-character TLDs. And I probably should have mentioned, too, that Chinese-character domain names are already available in China, Taiwan, Hong Kong, and elsewhere.

    I still disagree that ASCII is a “hurdle.” In China, it is not a technical problem for those who wish to enter a URL — or register one either, for that matter. Whether it’s fair that even people in China should be expected to use the Roman alphabet rather than Chinese characters — that’s a different question.

    I quite agree about the situation in Taiwan, though. One of the reasons I switched from Romanization.com was the length of time it took to spell that out. “R-O-M-A-N-I-Z-A-T-I-O-N.” And then I’d invariably have to add, “No, that’s ‘R’ like ‘Roger’ or ‘rabbit.’ And ‘N’ as in ‘no.’ ‘No,’ N-O: ‘No.’” Etc. It took ages.

    I’ve corrected the link to the original article.

  3. I tried the two urls in Opera 8.51 and they both worked fine. The punycode url displayed as Chinese characters. However, in IE 6.0 it was a different story. The punycode url loaded the page, but the url didn’t display as Chinese characters. The url with Chinese characters didn’t work at all. The operating system is Windows XP.

  4. Thanks for the useful info. I tested these two URL’s as follows:

    WinXP SP3
    =========
    - IE 7.0: both URL’s worked as documented, once I had manually included the Chinese character set under Language settings (a dialog popped-up and asked).

    - FireFox 2.0: both worked.

    - Safari 3.1.1: both worked.

    - Netscape Navigator 9.0.0: both worked.

    - Opera 9.25: both worked.

    Mac OS X 10.5.x
    ===============
    Firefox 2.0: both worked.

    Safari 3.1.1: both worked.

Comments are closed.