Some recent comments on the Hanzi domain name situation brought to mind a rant I was working on last month and then abandoned. But it seems worth finishing — relatively speaking, because this is a topic that touches upon so many areas that I could never get through it all — because the problem I discuss is a fairly common one. So today I’d like to address what I think of as the “like, wow” fetish of Chinese characters. In this, Chinese characters are regarded as if they bestowed a wonderful gift upon the reader that no other script could. But exactly how they do that and what exactly that gift is, though, generally doesn’t make too much sense.

This sort of thing is common, and not just among New Age nonsense. A good example of this approach is found in Search Engine of the Song Dynasty, an op-ed piece published in the New York Times in mid May. Basically, the author discusses how having URLs in Chinese characters is a good thing, but does so in a vague, flowery way that brings to mind a stoned grad student with a large vocabulary — which might not be so bad if the author had gotten the facts straight.

I had hoped for at least a little better, given that the author, Ruiyan Xu, has completed a novel, The Lost and Forgotten Languages of Shanghai, whose protagonist has bilingual aphasia. So one would expect Xu, who was born in Shanghai and moved to the United States at the age of 10, to have a better-than-basic understanding of linguistics. Alas, no — not if the article is anything to go by.

My annoyance here, though, isn’t specifically with Xu, who seems like a nice person and whose book has been getting some good advance reviews. It’s more with the “like, wow” phenomenon in general and the eagerness of the mainstream press to publish things about “Chinese” even though the substance of such articles falls apart if one devotes even just a little effort to examining it.

So let’s get into it., the popular search engine often called the Chinese Google, got its name from a poem written during the Song Dynasty (960-1279). The poem is about a man searching for a woman at a busy festival, about the search for clarity amid chaos. Together, the Chinese characters bi [sic] and dù mean “hundreds of ways,” and come out of the last lines of the poem: “Restlessly I searched for her thousands, hundreds of ways./ Suddenly I turned, and there she was in the receding light.”

For reference, I’ll provide the poem. I’ve put the Chinese characters used by in bold and red.



The author of the poem, Xin Qiji (X?n Qìj? / ??? / ???), lived from 1140 to 1207 and was thus a contemporary of such Western poets as the troubadours Bertran de Born, Bernart de Ventadorn, and Giraut de Borneil — hardly poets whose work suffered for having been written with an alphabet.

Baidu, rendered in Chinese, is rich with linguistic, aesthetic and historical meaning. But written phonetically in Latin letters (as I must do here because of the constraints of the newspaper medium and so that more American readers can understand), it is barely anchored to the two original characters; along the way, it has lost its precision and its poetry.

Ugh. Where to start?

I’ll go ahead and skip “precision,” even though that’s perhaps not a word best applied to most poetry written in Literary Sinitic, and start with “rendered in Chinese.” However common the word might be, “Chinese” is a poor choice. In this case, the word seems to be intended to mean not any particular language but rather “Chinese characters,” which are not a language. Here, too, she appears to be blaming Pinyin for having lost something from Literary Sinitic, which is what the poem was written in. But Pinyin isn’t for Literary Sinitic; it’s for modern standard Mandarin. Also, whatever language Xin Qiji spoke could have been written with an alphabet with no loss of meaning, just like all other natural languages.

As Web addresses increasingly transition to non-Latin characters as a result of the changing rules for domain names, that series of Latin letters Chinese people usually see at the top of the screen when they search for something on Baidu may finally turn into intelligible words: “a hundred ways.”

Baidu vs. ?? on the home page:

Can you feel the difference in precision and poetry? No?

Also, it’s not clear just how much of a “transition to non-Latin characters” there’s going to be, especially where Chinese characters are concerned, especially in places like Singapore.

Of course, this expansion of languages for domain names could lead to confusion: users seeking to visit Web sites with names in a script they don’t read could have difficulty putting in the addresses, and Web browsers may need to be reconfigured to support non-Latin characters. The previous system, with domain names composed of numbers, punctuation marks and Latin letters without accents, promoted standardization, wrangling into consistency and simplicity one small part of the Internet.

For “could have difficulty putting in the addresses” read “could find it next to impossible to enter the correct address.” And by “one small part of the Internet,” she appears to mean the name of every single domain on the entire Internet.

But something else, something important, has been lost.

Part of the beauty of the Chinese language comes from a kind of divisibility not possible in a Latin-based language. Chinese is composed of approximately 20,000 single-syllable characters, 10,000 of which are in common use.

No, no, and no.

  • By “Latin-based language” the author seems to be referring not to a Romance language but to a language that uses the Latin alphabet for its standard script.
  • What exactly is this divisibility? Mandarin words can be divided into morphemes. The words of English, French, etc. work the same way.
  • No language is composed of Chinese characters.
  • There are a hell of a lot more than 20,000 Chinese characters.
  • “Common use” is difficult to pin down. But most authorities would give a lower number.

These characters each mean something on their own; they are also combined with other characters to form hundreds of thousands of multisyllabic words.

No, that’s wrong. Again: words — whether they be multisyllabic or monosyllabic — are not made of Chinese characters. Instead, Chinese characters are the script most seen for written Mandarin.

Níh?o, for example, Chinese for “Hello,” is composed of ní — “you,” and h?o — “good.” Isn’t “You good” — both as a statement and a question — a marvelous and strangely precise breakdown of what we’re really saying when we greet someone?

Again, this is assigning meaning to characters, when the meaning is of course in the word itself.

Note, too, that “níh?o” is incorrect in several ways.

  • One of the basic rules of Hanyu Pinyin is that tone sandhi is not indicated. So even though — because in Mandarin if something has two third tones in a row, the first shifts to second tone — the greeting is pronounced níh?o, the diacritical mark over the i should indicate third tone (?) rather than second (í).
  • The diacritic over the a is wrong. It should be a? (Unicode ̌), not ? (Unicode ă) — sharp vs. rounded. (You may need to enlarge the fonts on the screen to see this.)
  • Most careful authorities write this with a space rather than as solid: n? h?o rather than n?h?o. This, though, is something I don’t much care about. Popular usage of Pinyin as a real script will eventually work this out one or the other. Also, if someone is going to err in word parsing, I’d much rather they do it by making words solid rather than by breaking up the syllables.

The Romanization of Chinese into a phonetic system called Pinyin, using the Latin alphabet and diacritics (to indicate the four distinguishing tones in Mandarin), was developed by the Chinese government in the 1950s.

I’m a bit surprised the copy editors at the New York Times let that muddled sentence though. But I’ll pass over it without further observation.

Pinyin makes the language easier to learn and pronounce, and it has the added benefit of making Chinese characters easy to input into a computer. Yet Pinyin, invented for ease and standards, only represents sound.

In other words, Pinyin represents language — that being what writing systems are designed to do. And, yes, it’s easy to learn and use, which happens to be a good thing, not a bad one.

In Chinese, there are multiple characters with the exact same sound. The sound “b?i,” for example, means 100, but it can also mean cypress, or arrange. And “Baidu,” without diacritics, can mean “a failed attempt to poison” or “making a religion of gambling.”

My dictionary gives some different phrases. But whatever. Then there’s also the simple point: If there’s a problem with writing Pinyin without diacritics, then don’t write Pinyin without diacritics, write it with diacritics. But I have a hard time imagining how anyone would get such things confused in context.

Q: “Honey, could you check Baidu for information on when that new movie is coming out?”
A: “Baidu? Sorry, could you write that in Chinese characters for me. I can’t tell if you mean “a failed attempt to poison,” “making a religion of gambling,” or the search engine.”

Behind this is just the usual homonym canard. In English, as in other languages, there are many morphemes with the exact same pronunciation (sound). If we look at the closest English has to the Mandarin sounds bai and du, we can get by, buy, bye, bi-, and dew, do, due, etc. — all of which have various meanings. Take a look.

Those who don’t need to be hit over the head again and again to understand the simple point that English has plenty of homonyms but does just fine with an alphabet — as would every other natural language, including of course Mandarin and the other Sinitic languages– may wish to just skim the following blockquote.

All that's without me bothering to get out a big dictionary.

Alas, poor English! How confused we must be to be using a mere alphabet. Oh, if only we could achieve linguistic, aesthetic, and historical meaning!

In the case of, the word, in Latin letters, has slipped away from its original context and meaning, and been turned into a brand.

Baidu is a brand, and as is generally thought of as such regardless of what script it is written in. Furthermore, it’s understood as a “word” only as that search engine. In the poem the characters “??” are used to write not one word but two — and even written in Hanzi this is not something more than a relative handful of people in China or Taiwan would recognize as having come from that poem unless someone told them about it first.

Language is such a basic part of our lives, it seems ordinary and transparent. But language is strange and magical, too: it dredges up history and memory; it simultaneously bestows and destabilizes meaning. Each of the thousands of languages spoken around the world has its own system and rules, its own subversions, its own quixotic beauty. Whenever you try to standardize those languages, whether on the Internet, in schools or in literature, you lose something. What we gain in consistency costs us in precision and beauty.

When Chinese speakers Baidu (like Google, it too is a verb), we look for information on the Internet using a branded search engine. But when we see the characters for b?i dù, we might, for one moment, engage with the poetry of our language, remember that what we are really trying to do is find what we were seeking in the receding light. Those sets of meanings, layered like a palimpsest, might appear suddenly, where we least expect them, in the address bar at the top of our browsers. And in some small way, those words, in our own languages, might help us see with clarity, and help us to make sense of the world.

Clarity? Clarity?!

I understand that the author, as a novelist rather than a linguist, might be preoccupied with the whole Ezra Pound “make it new” and “give people new eyes” thing. If so, good for her. But, still, one should not not confuse flights of fancy, no matter how cool they might sound, with facts and should at least attempt not to be completely wrong about almost everything, especially when publishing in the New York Times.

If the argument for Chinese characters is supposed to be that their continued, indeed expanded, use is necessary so people can quote poems in Literary Sinitic out of context so that what would be at best a low-single-digit percentage of native speakers of Mandarin or another modern Sinitic language might recognize the allusion despite a lack of context and might get a Hanzi-licious frisson out of the experience … that would have to be one of the most ridiculous things I’ve ever read.

Kicking the irony meter way up on all this is that the author of those remarks on the really cool feelings one can get from reading Chinese characters cannot herself read texts written in them, though she neglected to mention that little bit of information in her New York Times piece.

And for irony on top of irony, as someone who left China at the age of 10, she likely still knows her native Sinitic language, so texts written in romanization could give her the literacy in that language that she lacks in Chinese characters. Romanization could provide meaning; but instead she harps upon the virtues of Chinese characters.

Oh, and for a final bit of irony, here’s something else the author apparently didn’t bother to check: ??.com already exists. And is anyone surprised to hear that the site at that address is not a search engine of the Song dynasty? Here’s what it looks like.
screenshot of ??.com -- a linkspam site -- as of July 1, 2010

That’s right: ??.com is just a linkspam site. But apparently because, unlike, it has Chinese characters in the URL it’s linkspam with its own quixotic beauty; it’s linkspam with its own sets of meanings, layered like a palimpsest; and it’s linkspam that is rich with linguistic, aesthetic and historical meaning.

C’mon, people! Feel the poetry of it! The precision!

Like, wow.

  And that was posted in the NY Times? Wow. I can only say hats down to you. Someone who knows a little bit of Chinese or linguistics will quickly realize, that her text is full of bogus claims. But you did a great job to reveal all of them in every detail. I wish she would read your post.

  2. Slightly off-topic, but are there any statistics on the input methods used by Chinese internet users? I’ve been unable to find any. If they use pinyin, it’s no easier to enter the new domain names than the traditional Latin ones. I know the argument may be that it’s to make the internet accessible to the people who aren’t currently online, but is the general population in China really so unfamiliar with Roman letters?

  3. I just finished reading Chad Hansen’s article “Chinese Ideographs and Western Ideas”, so I wanted to play devil’s advocate here a little.

    Stipulated: a lot of the claims in the article are wrong, and it’s silly to think that Chinese people would instantly get the poetry of “??” in a way that they wouldn’t get the poetry of “baidu.”

    In the article, Hansen claims that Western commonsense holds that

    Writing represents speech (=language) represents ideas/meanings represent things

    whereas Chinese commonsense (which you can see reflected even in what the author wrote) is that

    Speech represents characters represent things.

    Characters do the explanatory work that “ideas” do in the West, but because characters are publicly available and historically constructed, they don’t run into the philosophical puzzles that ideas do, such as Wittgenstein’s “no private languages” argument against the existence of Mentalese.

    Because of this inversion of the relationship between speech and writing, you get claims like “Mandarin and Cantonese are dialects.” Dialects? Dialects of what? Written Chinese. The sound xue and hak are both representations of ?, which in turn represents the physical act of studying.

    Hansen goes on to say a lot of other stuff, and to defend his use of the term “ideograph” for hanzi (though with a slightly different meaning than “idea + writing”), but that’s probably the important part for this discussion.

    So, when you write, “No language is composed of Chinese characters,” my tendency is to disagree. Before the May Fourth Movement, there was a widely used language composed of Chinese characters, Literary Chinese. Of course, even Literary Chinese was strongly influenced by various dialects of Spoken Chinese, especially Mandarin, but the same can be said for American Sign Language. ASL is an independent, visual language, but it is also highly informed by English, and most ASL speakers (signers?) can also understand written English. In the same way, it is perhaps possible for there to be a written language that is a *language* in its own right without necessarily needing the intermediary of speech before it becomes thought.

    Of course, as you say, none of this has anything to do with practical issues like ex-pats being able to read their mother tongue. Basically, it seems like the nation of China is putting its children through a lot of extra work in learning to read, just to give them a slight (and only slight!) leg up if they ever become interested in reading Literary Chinese texts. Some might argue that this is worth it as a cultural patrimony. Others might say that specialists can study hanzi without involving everyone else. Not being Chinese or living in China, I don’t have much to say about which perspective correctly balances values and goals.

    But, getting back to the topic of domain names, it does seem a little silly to get someone to type b-a-i-d-u on their keyboard, press the convert key, select ?? and then add .com. But I still think it’s good for people to have the choice of using non-ASCII domain names. For languages that already use one alphabet or abugida, it’s silly to have to learn a whole other set of characters just to be able to use the web, on the off chance that someone who can’t read your language will want to type the name of your website but won’t know how to use the Punycode encoding to represent it in ASCII.

  4. Very nice article! Poetry… Hmm, I could give her a poem, even one she could read, but because she can read it, I suppose it will not be “rich” enough… Conclusion: Things she can not read have a deep meaning, things she can read are shallow. I suppose she can read her own writings?

    AFAIK, schools in the PRC are teaching Hanyu Pinyin, like those in Japan are teaching Hepburn. Here in Taiwan, no matter what currently is declared to be the official system, no romanization is taught (except for example individual departments at schools like Wenzao), only Zhuyin, starting at kindergarten.

    Funnily, some people here in Taiwan complain about “Zhuyinwen”, Chinese text (sometimes just a syllable, sometimes a few words) written in Zhuyin transcription on BBSs, forums etc. It seems easier (OK, and somehow “fashionable” too) for some younger people to use the phonetic script they learned since kindergarten than selecting the correct Hanzi character. Had they ever been taught Hanyu Pinyin, they could easily communicate that way.

    In games like Enemy Territory, where complex characters were not supported, it was always funny to see how players from the PRC could easily communicate in Chinese using Hanyu Pinyin (Right, they did not exactly write poems then…), while players from Taiwan – had to use English…

    So if someone had dropped out of school early, only learned a number of Hanzi (Enough? Enough for what?), never learned English (which means elementary school only), then that person would still have a hard time putting in Hanzi on a computer: Even some so-called academics have their problems with IMEs. At least one has to be learned, and that is something separate from learning Hanzi. Zhuyin is (in theory) the easiest in Taiwan, Hanyu Pinyin would be equally easy (Better even, since the characters are not blocking keys otherwise needed.), if only it would be taught.

    A lot of Taiwanese actually have noticed more or less that phonetic scripts are easier to use than Hanzi, the problem is they do not recognize this, because they consider both Hanzi and Zhuyin as “Zhongwen/Zhongwenzi”. Seeing how fixated some people are on Hanzi, I seriously wonder how they can understand what I say when I am not wearing a hanzi subtitle display across my chest…

  5. Wonderful post, pity lots of people will have read about magical Chinese from that NYT article.

    What they should have done is get her to try and explain the etymology of the character ? and how it relates to the meaning. This was the character that made me give up looking for character etymologies because the explanation made less sense than just memorising the strokes!

    This was great:

    A: “Baidu? Sorry, could you write that in Chinese characters for me. I can’t tell if you mean “a failed attempt to poison,” “making a religion of gambling,” or the search engine.”

    It needs to be translated back into Chinese so they can enjoy it too!

    Wow, some people really pronounce “dew” and “due” the same as “do”.

    AFAIK, schools in the PRC are teaching Hanyu Pinyin, like those in Japan are teaching Hepburn.

    When I was teaching there they were teaching Kunrei in elementary schools and Hepburn in junior high schools.

  6. Your main problem here is investing in any sort of belief that the new york times is more able to put out decent material than anyone else. There’s a reason people read your webpage and not the Nytimes for chinese info that isn’t shyte. Major media companies kinda suck at adapting and changing and not putting out poop.

    As for IMEs, rare is the day that you can find people in china, who use computers, who are under say, 30, maybe 35, who use anything except pinyin. I would guess IME usage statistics put pinyin at well over 95 percent.

    Regardless of anything, I think everyone should be happy that another non-latin alphabet is getting support. It’s certainly long overdue.

  7. For whatever reason, most mainland Chinese I’ve met, young or old, aren’t nearly as fluent with Hanyu Pinyin as with characters even if they were taught it in school. Typing ??.com does indeed require an extra step than typing if the person is using Pinyin. But having a domain name in Hanzi makes it much more memorable to your average Chinese than in English or Pinyin. This is why many domain names in China are either abbreviations or numbers.

  8. @Weili,
    How does it make it more “memorable” when they still have to perform the same action to get to the website, ie type the letters “baidu”? Are you seriously suggesting they have the ability to type letters but lack the ability to read them?
    I would be willing to believe that people will find reading whole texts in Hanyu Pinyin difficult because they aren’t used to it. However, I’m skeptical that URLs would be a problem.

  9. @JB,
    They don’t have to type “baidu”. Using Google’s pinyin IME, just typing “bd” will get you there. Also, it’s true what Weili said about Chinese people’s fluency in reading hanzi vs. latin letters. To them, they really have to stare at latin letters to get the meaning, whereas just barely a glance at hanzi is all that’s needed — exactly the reverse of us.

    I’m not saying I have no reservations about IDNs — I do. Whereas all Chinese users can at least get by with alphanumeric domain names, westerners would be completely ??ed when it came to trying to A, transcribe; B, dictate; or C, enter into a computer any hanzi domain name. There’s a definite asymmetry that’s very worrisome, in my mind.

    Brilliant post, Mark! Very fun to read.

  10. I laughed til my stomach hurt. Very fun article.

    I have to admit, however, that even I find it easier to read Hanzi than pinyin, and I’m not even native to Mandarin (or any Sinitic language).

  11. Dude! You like so totally don’t get it. Didn’t you watch Hero?

    There is a problem with these domain names though – as pointed out by others, you are restricting access by putting the addresses in a script that only a portion of web users can read. If only there were some simple system of symbols that were easy to learn and could be used by people around the world regardless of their native language. I’d say about 26 of these symbols would be optimal to sufficiently represent sounds in all kinds of languages. If only.

    I’ve noticed, and I’m sure you have too, that the arguments put forward in support of characters vary depending on whether the proponent can read them or not. The “magic conveyance of meaning” myth is prevalent among non-literates – strange, as you’d think the very fact of their illiteracy would be disproof enough. The “diverse dialects, single written language” myth (and its “Unity of Chinese” variant) has strange currency among character literates, even Cantonese and Taiwanese speakers who again ought to know better from their own experience.

    When I speak to you in person, I see great passion alongside the understanding. It’s good to see that same passion on the page. More rants, Sir!

  12. @Taffy

    Why would a smart business purposely limit the number of people who can access its website? In the beginning, and in the foreseeable future, domain names in non-Roman scripts will almost strictly be secondary, as in redirects. Even if a business only has a non-Roman script domain, that would mean it would probably get less business than a competitor who has both. In the end, things work itself out.

    The very fact that you and others are even against the existence of domain names in non-Roman scripts is actually quite alarming. You argue that the existence of domain names in non-Roman scripts will not make it accessible by those who only use the Roman script is like saying that everyone should only speak English because otherwise you won’t be able to understand them. This quite honestly makes me sick.

  13. Weili,

    your analogy is so off it’s insulting.

    Saying that saying that domain names should be ASCII is like saying that everyone should only speak English is like saying that by writing your reply you’re in favor of killing puppies. Which raises the question: Why do you hate puppies? =)

    What we’re saying is that the domain names should be limited in terms of characters used. This is completely different from the pages themselves and as the article you’re commenting on pointed out, baidu being in Latin alphabet didn’t stop them from using Chinese characters everywhere else.

  14. Do those against IDNs understand that all Unicode domains can be mapped to Ascii using a system called Punycode? Yeah, if you just saw the domain printed on a billboard you couldn’t enter it, but by the same token you couldn’t read the billboard, so how would you know you wanted to go to the site?

    I understand that sys-admins may need to visit sites they can’t read, but that’s when you can use Punycode.

  15. @bleh

    That’s like saying “Fine you can speak non-English languages, but you must all adopt ‘English names’.”

    I admit, that is better than “Speak only English and nothing else” but still pretty bad.

    Again, a smart business would not purposely limit the number of potential visitors to its site. Having an ADDITIONAL domain name in Chinese would only makes it more memorable for potential Chinese visitors and therefore increase overall traffic. Those who can’t read Chinese would still be able to use the site’s main (or redirect URL) that is in Roman letters.

    If the business ONLY has a domain in non-Roman script, they probably don’t want your business anyway.

  16. Yes, pretty easy: Just enter the domain name into a converter and it will show you the punycode. Ah, you don’t know how to key the domain name? Well…

    I have an egg, anyone has a hen?

    If it is linked anyway, why the other characters then? Name the link accordingly, in any language, in any script. The URL itself does not need to be affected.

    But if it is not linked, how would anyone want to enter it if the computer he is using does not support the keyboard layout/IME? The current characters are supported by any computer, even my Amiga 1200 can handle them! But I am not sure if I can install a Japanese IME in a German internet cafe…

  17. @dl7und

    If it’s linked, the domain name could technically be in any script just as you said. But the purpose of IDN isn’t for links, it’s mostly to be used on printed material like a poster at the subway station and people memorize names easier in their native language.

    For the third time, smart businesses (those with sites you most likely would go to anyway) would have multiple domain names and at least one in the more “traditional” Roman letters, so the point is moot.

  18. Pingback: Chinese Characters: not so magical | Sinosplice

  19. Chinese languages are no less suited to Chinese characters than Western languages are to Latin alphabets. Does the author not understand that Chinese languages are all analytic—very different from western descriptive languages? In English we distinguish homonyms via context or spelling. All similar sounds in Chinese languages are THE SAME. Words do not morph spellings in Chinese languages because they are anchored in their analytic roots.

    Saying Chinese characters are not magical might be enough to refute a wide-eyed Eastern fetishist, but what about all the Chinese poets who played with meanings and sounds of characters? Is their delight in the nuances of character radicals meaningless simply because you yourself could care less about it?

    I understand the author’s intent is to push back against the naiveté of Xu, but I’m afraid the author goes beyond that, hinting at superiority in phonetic alphabets. I suppose my point is that just because you see characters as no more than a means of expressing a spoken language does not mean that they cannot have more meaning to someone else.

    NB I’d like to add that pinyin input method for Chinese characters is not the only input method. There are others based on radicals, which are faster to use when one really understands how characters work, especially since characters do not really change even when dialects do.

  20. I read both the NYT article, John’s post, and your post regarding Chinese characters. I find a striking difference between the western definition of “language” (from you, Mark and Wikipedia) as opposed to the Eastern definition of “language”, or at least what you perceive as the meaning of language. Unlike the alphabet, the Chinese language was never supposed to represent the spoken word. Prior to developing their own written languages, many countries in Asia used Chinese characters as a way to communicate with each other. These days, only a handful of countries still employ all or some of the characters to some degree. I have never learned Korean, but many say that their alphabet system is the most logical and makes for the easiest of learning. It essentially replaces characters and pinyin with how words should sound, but I can’t make my way around Korea looking at the characters, I can get around pretty far in Japan. In the early 1900s, people have argued for the elimination of characters and adopting solely a written language based on Romanization. But to be honest, china is still relatively a multi-dialect country, even with the domination of mandarin, many people will still pronounce a word that is of local dialect. Who’s to say that ?? is suppose to be pronounced “haizi” instead of “shazi”? You can say that “formally” ?? is pronounced “haizi”. One actually dictates how the word is suppose to be pronounced, and the other does not. While China is not exactly shy about making everyone say something the same way, but perhaps maintaining a written language that does not represent how words are suppose to be pronounced gives people some leeway as to their spoken language. Let me give you an analogy in English, if we take the word “tomato” for example in English, should we write it as “tomayto” or “tomahto”? Are they actually different things if we write it one way or the other? Is a tomayto not a tomahto? And if a tomahto “wins out” in the pronunciation game, do we then have to switch our dialect and say “tomahto” instead? It is much easier to refer to tomato as tomayto, but with tomahto, it’s hard to get around that. That’s a pretty simple example, as you can tell, it can get much more complicated. People speak as if the English language makes a lot of sense, that everything in it is very easy for people. But English is not easy, it’s made up of a number of different languages that is not easy for someone who’s not of English decent to learn. Christian Science Monitor did an article awhile back regarding is we should make English spelling easier.

    And I quote from them “Wiel wee ar at it, wee might az well cleen up ar eeraygyoolar verbz, mayking the preterit and past participal the saym in awl kasez and regyoolarizing awl ar plooralz. Bi then wee wil hav noe dowt bringed the literasee levelz of Inglish speekerz up to thoz uv other kuntreez.

    The reezult would bee a langwadje that iz konsistent, lojikal, and abil too bee eezily understud by evereewon. Jorj Bernerd Sha wood bee prowd.”

    As the English is actually pronounced from people now, this is what it should look like in writing, but in reality it’s far from it. Now imagine how that sentence would look if it’s based on an British accent? A Scottish? How about south eastern Boston? Or should I actually write “Basten” if I were from “Boston”, and Boston if I’m from Michigan? I think you get my point. I personally don’t know if it’s better to switch the written characters to some other system that may be easier for people to learn. I’m not I’m not sure if a language has to be “poetic” or “technical”. I personally feel like Chinese characters have served it self well for many centuries, there may be need for gradual changes, but there are many aspects to be considered.

  21. Weili and Klortho, good point, but it’s not forcing everyone to speak English because the Roman alphabet is not “English”- it’s used by all Western European languages (afaik), as well as Bahasa Indonesia and I believe others. It is also widely recognized by speakers of other languages. This may be unfair, but I’m unconvinced that unfairness outweighs the benefits of having a common script for urls.
    Kong and InF, characters change with dialects. I am able to read academic papers written in Mandarin, but I am unable to read many simple phrases written in Hokkien and Cantonese.
    Also, are you saying there is no context in Mandarin? How else are homonyms distinguished in spoken Mandarin?
    If you’ve read much of this blog you’d realize that the author does in fact think alphabets are superior to characters. Typically speaking characters are simply a form of written communication. Even Chinese poetry is bound to spoken language- that’s why many people say Tang dynasty poetry sounds better in Hokkien or Cantonese, because poets wrote according to the sounds of ancient Chinese, and those two languages have preserved the sounds of ancient Chinese better than Mandarin. Regardless, for most purposes I don’t see how characters possibly have any more meaning than an alphabet would, except as a symbol of Chinese culture.
    InF, it is exactly because Chinese characters were never intended to represent spoken language that they are less efficient than alphabets.

  22. @InF
    Don’t you think that some of your statements are a bit odd?

    “the Chinese language was never supposed to represent the spoken word” So, “spoken words” are not language? Are you sure that you are not confusing language and writing system? And if with “Chinese language” you mean things like “??”, did you notice that according to you, the Japanese people use “Chinese language” to write the Japanese language? Don’t you think that this sounds a little bit weird?

    “It essentially replaces characters and pinyin with how words should sound” I always thought that “pinyin” ???? would mean to arrange sounds, to put them together (to form a word, for example). Every alphabet is basically “pinyin”, because it does right that: ??????. Zhuyin is a kind of pinyin, just with characters different to Hanyu Pinyin. Japanese kana are a kind of pinyin, because they only represent sound.

    “Prior to developing their own written languages, many countries in Asia used Chinese characters as a way to communicate with each other.” “Written language” is a strange concept, and the result are the two terms people here in Taiwan use for most languages: ??/??, ??/?? etc. Do you know any other language that needs two terms to name a language? I don’t. People do not need a “written language”, they need a way to write down their language – which can be done in many forms. Vietnam had been part of China, so “naturally” (surely with some motivation from the emperor) they used the same characters. Later they found that the characters used by the French were much easier and better suited, as they indeed allowed to represent the sounds of the language. Do you think English can only be written down with the characters I am using now? -.– — ..- .- .-. . .– .-. — -. –. .-.-.-

    “I find a striking difference between the western definition of “language” (from you, Mark and Wikipedia) as opposed to the Eastern definition of “language”” Please do not think that many countries in Asia share your view, there is no “Eastern definition of language”. Even between the PRC and Taiwan there are many differences in linguistic terminology and views, despite the “same language”. Japan sees things very different from Taiwan, although they partially use similar characters. Vietnam? Indonesia? Philippines? Mongolia? Nah…

    And please, don’t try to make sense of English pronounciation. That would still have worked about 1000 years ago, but not these days. If you learn a few European languages, you may find that English pronounciation is rather irregularly reflected by its spelling. If you want a very regular (in terms of spelling portraying pronounciation or vice versa) language, try Italian, Russian, or even better Hungarian. There is a good book named “Inside language” (by Vivian Cook) explaining (among others) the problems with English pronounciation quite nicely.

  23. “Also, are you saying there is no context in Mandarin? How else are homonyms distinguished in spoken Mandarin?”

    In the worst case scenario they are distinguished in reference to characters. For example, since the last name Zhang can be represented by two different characters, to indicate ? or ? a reference to its radical construction would be used, just as distinguishing the names Goldman and Goldmann would require elucidation of their spelling.

  24. That seems to be a rare exception to me. Names are bound up with personal identity, so it does matter to people how their names are written/ spelt, but for emotional rather than practical reasons. In almost any other situation however you don’t need to be told the characters once you hear a word in context. As Mark illustrated in his “baidu” example, typically the meaning of homonyms is clear from context. I personally can’t remember the last time I saw someone write down the character or shouxie a character to make their meaning clear, unless it was to introduce new vocabulary.

  25. Yes, context makes the meaning clear – at the same time that it makes the characters clear. No one is worried his request to perform a Baidu search will be misunderstood. But ideally, as the other party parses “Baidu” to mean “that search engine”, the characters and their associated meanings should pop up as well, however briefly.

    Not to do the “like, wow” thing here, but language is indeed enriched by associations. In the same way that it’s easier to recall John’s profession than his last name, in the same way that synesthetes display elevated memory, the more associations we can draw to a given word or phrase, the more richly it is processed by our brain, and the better memory we’ll have of it later.

  26. I suppose my point is that just because you see characters as no more than a means of expressing a spoken language does not mean that they cannot have more meaning to someone else.

    Exactly. I find the Xu-bashing here repellent and unnecessary. Yes, she has delusions about language — delusions shared by virtually everyone who has not taken a course in linguistics. I myself am as far from a worshipper of the Mystery of the Chinese Character as you can get; I have repeatedly said China should switch to pinyin altogether and leave characters for specialists in classical literature. But I do not pretend that nothing would be lost by this. A great deal would be lost; it is only the greater good of universal literacy that forces me to the conclusion that on balance the change would be better. But to mock the author for regretting the loss of the poetic resonance in the Chinese phrase when turned into a meaningless pair of syllables suggests a childish inability to see more than one thing at a time. She has a point and she is not an idiot; she simply (like the vast majority of humanity) does not have a scientific understanding of language, and this is not her fault.

  27. Does she think quixotic is a portmanteau of quirky + exotic? Or what trait would all the languages of the world have in common with the cosplay knight?

  28. InF,

    I have never learned Korean, but many say that their alphabet system is the most logical and makes for the easiest of learning.
    Many as in who exactly?

    It essentially replaces characters and pinyin with how words should sound
    You do realize that hangul was created more than 500 before the invention of pinyin, don’t you?
    Moreover, just like the Roman letters in pinyin, the letters of hangul represent sounds and save for some cursory visual similarity, hangul has no connection with hanzi.

  29. Hat, I agree with you that switching to Pinyin would be a Good Thing for China and the other Sinitic countries. But it must be recognized that modern written Mandarin is neither pure baihua (which can always be written in Pinyin) nor pure wenyan (which cannot); it’s a hybrid between the two, mostly baihua but with the amount of wenyan content and influence rising as the language register goes up. Taking a contemporary high-register text in hanzi and transliterating it to Pinyin will render parts of it, possibly critical parts, as unintelligible as Yuen-Ren Chao’s famous “Lion-Eating Poet in the Stone Den”.

    So advocating for Pinyin is necessarily also advocating for pure baihua writing everywhere and always, a substantial change in the lexis and grammar of written Mandarin, not merely its orthography.

    InF: There are many improvements that could be made to English spelling without making it specific to any one dialect (I agree that many spelling reformers propose exactly this). For example, “friend” always rhymes with “end” in every dialect, and there is absolutely no reason, except pointless fidelity to ancient tradition, not to spell it “frend”. Similarly, “have” does not rhyme with “pave” in any dialect, and there is no reason, except an old rule that dates back to when U and V were the same letter centuries ago, not to write it “hav”. A systematic search for such improvements would still leave English spelling subject to complicated rules, as French spelling is; but it would dispose of most spellings that agree with no rule at all, as French has done (a few words like “oignon”, which is pronounced as if it were spelled “ognon”, still survive).

    Nongandwong: In fact, a majority of native English-speakers pronounce “do”, “due”, and “dew” the same (basically everyone in North America), and a very small minority pronounce them all differently (mostly in Southern Wales and parts of the North of England), which was the original situation and explains why they are spelled differently. The remaining large minority pronounce “due” and “dew” the same and “do” differently.

    Dl7und: There are many languages whose speakers use a different language for communication in writing. Although matters have diverged since, Japanese people originally did not begin by writing Japanese in hanzi/kanji; rather, they simply wrote in Chinese and left their own language unwritten. Until May 4th, modern spoken Mandarin was just another such language.

  30. Well, actually I wouldn’t expect people to type ??.com, I would expect them to type ??.?? instead. Why make folks who have their keyboards set to use Chinese characters switch to Latin to navigate Chinese-language websites? It’s not a matter of poetry, but of practicality. For everyone else who wants to get to that website, there’s Google.

  31. niuyueshibao shangmian zhe pian wenzhang dique youxie wenti. zuozhe taiguo kuada le hanzi suo yuhan de yiyi, jiu suan baidu de mingzi lai zi songci, dan shi baidu ziji de wangye sheji litou, ye kan bu chu nali xiang yao biaoxian zhe yi dian. jiran baidu dou mei zhege yisi, ye bu dasuan yongyou hanzi yuming, zuozhe ziji de zhongwen duxie nengli ye youxian, jiu rang ren namen, bu zhidao ta wei sha pian yao zhe me jianchi fei yao yong hjanzi bu ke. er qie dang baidu zhe liang ge zi cong songci litou chouqv chulai zhihou, bian cheng le sousuoyinqing de pingpai mingcheng, ye yijing bu shi yuanben de yisi le. yong luoma zimu pinxue chu lai de baidu ben lai jiu mei you yao rang ren xiang dao xinqiji de qitu, jiu zhi shi yi ge pinpai eryi.

  32. Mark Swofford may be a bit of a curmudgeon, but don’t we need public consciences to protect us from those who romanticize, exoticize, and distort what they don’t understand? And, when it comes to linguistics, are we to let sentiment rule over science?

    Comment from one of my graduate students: My response to Language Hat’s comments might be that it is fine not to know, but not fine to pontificate at length as if you do on the pages of the NY Times (who should really share some of the blame). The description of the novel’s plot, incidentally, was also nausea-inducing.

    (Because some of the participants in this debate have commented on two or more of the relevant sites [LL, LH, and Pinyin News; i.e., there is a triangular discussion going on], I am cross-posting this comment to all three sites.)

  33. “This may be unfair, but I’m unconvinced that unfairness outweighs the benefits of having a common script for urls.”

    I remain unconvinced that there are *any* benefits for having another common script for URLs in addition to PunyCode. Seriously, what is the use case?

  34. Isn’t there something just a little unseemly about such an emotional, sarcastic, and biting retort about Chinese people’s romanticism for their own writing system?

    I wouldn’t argue that pinyin couldn’t be an alternative, but even if I strongly believed it was the obvious Better Way, I can’t see myself getting this worked up over it. A frustrated Guoyu teacher in Taiwan who hates all the time he is forced to “waste” teaching his students their characters, fine. But when I imagine it the other way around, an angry, all-Mandarin flamewar over the “obvious” need to reform English spelling, the high-strung tenor of the criticism starts to seem a little overdone. Even admitting that there is a lot linguistically suspect about the original article, does the fuzzy romanticism other people feel for something culturally important to them actually upset that much? Really? Where would that road end if we pursued it far enough? Would it really pass us by?

    Besides, whatever one may think about the debate, there’s no denying that the viewpoint she presents does represent the way quite a lot of Chinese people actually feel about their writing system. And it is, as has been pointed out, a small op-ed piece, not a serious article with intent to spread accurate knowledge about the nature of the Chinese writing system.

  35. I recently had a discussion about this with my aunt, who majored in Chinese literature back in college (in China). Her argument for hanzi (versus mine for pinyin–I am also illiterate) was cultural, as has already been enumerated. I can certainly see the unification effects of written Chinese, since dialects continue to flourish and it’s anthropologically problematic to argue for their extinction (pinyin is based on the phonetic pronounciation of putonghua, which I would assume is the -first- language of a minority of the people living in China today). While extended dialects such as Cantonese may invent new characters, this simplification fails to account for the numerous rural dialects that vary enough for difficulty in understanding (or even just enough to differ significantly from putonghua) but not enough to necessitate changes in written Chinese characters.

    On a side note, one of my aunt’s friends commented to me that she has much greater difficulty reading English signs IN ALL CAPS than the same words in lowercase. My theory is that uppercase and lowercase letters register in her brain as different “characters.”

  36. Danny Bloom said,
    July 12, 2010 @ 9:19 am

    One thing readers here should note and be aware of: this NYTimes oped piece was assigned to Ms Xu because her agent told the Times oped desk that Xu has a new novel coming out in Octobe and sure would be cool to get a kind of free, unpaid advertorial mentioning the novel’s title in the author’s ID in the print edition of the Times as a good way to spread news of the upcoming novel’s pub date. Capice? This is how the Times does business. Xu wrote that piece for the Times because the Times commissioned it because her book agent asked for it and there it is. These opeds don’t just appear magically out of the blue. It is all agenting and marketing behind it all. I know. The Times pretends these pieces are sent in cold, they are not. Every oped in the Times was assigned and commissioned, ask David Shipley there, he will dish. And this oped was not even about her new novel, so why in the world was it assigned to her? Guess. Name awareness. She is now a star, as “published in the New York Times” the early reviews will say. The Times should be more transparent, one, and the public should be more aware of how these oped shenanigans work.

  37. Maybe the argument is that by using an alphabet rather than characters that Personal Names lose “meaning”? In English, I imagine that most of us have to refer to a baby-name book to recognize the meaning of names such as John or Sue or Barack. I seem to remember hearing that in Korea, which uses a phonetic script, that Chinese hanzi are often associated with one’s name to add some “meaning” to the choice of name.

  38. @elessorn: it isn’t her writing system. She can neither read nor write it. The fact that she is ethnically Chinese, and spent the first ten years of her life in China, does not qualify her to speak about aspects of Chinese culture that she has never experienced first-hand.

    To those others who still hang on to the belief that Mandarin speech somehow represents characters: how then is it that Chinese children learn to speak before they learn to read and write?

  39. “To those others who still hang on to the belief that Mandarin speech somehow represents characters: how then is it that Chinese children learn to speak before they learn to read and write?”

    I would say that literary Chinese neither represents any one form of spoken Chinese, past or present, nor is represented by it. Literary Chinese just was an independent language heavily influenced by the many varieties of spoken Chinese over the centuries. Modern written Chinese is more obviously tied to Mandarin, but still not quite identical to it or merely a representation of it though it’s trending in that direction.

    “While extended dialects such as Cantonese may invent new characters, this simplification fails to account for the numerous rural dialects that vary enough for difficulty in understanding (or even just enough to differ significantly from putonghua) but not enough to necessitate changes in written Chinese characters.”

    Arguably, if pinyin replaced characters, Cantonese speakers would learn that the sound “hak” is written with the letters x, u, and e just as easily as they now learn that it is written with the character ? and just as easily as English speakers accept the useless a in “learning.” (Admittedly this is an especially hard case, since Mandarin has completely changed the sound for ? compared to other Sinitic influenced languages.) Diglossia is a widely attested phenomenon, and even English exhibits it to a degree with bizarre spellings like the re in “bizarre” and so on.

    Arguably, having pinyin would spur more interest in writing vernacular texts in dialects rather than just writing everything in the semi-foreign language of putonghua.

  40. If she cannot read or write Chinese, she says so herself, how can she write in the NYTIMES…**we we we we

    isn’t she being a bit dishonest here with the Times editors and Times readers, since from her name we assume she is Chinese literate? WEIRD. no?

    ”When Chinese speakers Baidu (like Google, it too is a verb), **we** look for information on the Internet using a branded search engine. But when ***we see the characters for b?i dù, ***we might, for one moment, engage with the poetry of ***our language, remember that what ****we are really trying to do is find what ****we were seeking in the receding light. Those sets of meanings, layered like a palimpsest, might appear suddenly, where ****we least expect them, in the address bar at the top of our browsers. And in some small way, those words, in our own languages, might help us see with clarity, and help us to make sense of the world.”

  41. I politely wrote a nice note to Ruiyan in NYC about her website and told her that one remark about her work had a typo and maybe correct it later and she did not reply to me or correct it.
    It’s not a big t*po, and I make t*pos all the time, and I just wanted to help her. But so far she ignores me. Juen should be JUNE, Ruiyan and of course you meant to type that in, but yr fingers touched the wrong notes. Fix. Page will look better that way.


    “Juen 28, 2010: Publishers Weekly says ”The Lost and Forgotten Languages of Shanghai” is promising debut.
    [Promising Fiction Debuts]“

  42. Has anyone ever pointed out that Baidu could be a ripoff of Google, which is in turn a play on the word googol, which is defined as a number that is equal to one followed by 100 zeros? And couldn’t du also mean “degrees”? 100 degrees of search power? That was my impression when I first encountered Baidu.

    I have to agree with some of the comments made earlier. With Chinese names for example, I find it extremely difficult to remember names when they’re written in Pinyin. But once someone writes their name out in Chinese characters for me, I immediately catch on. All of a sudden, their names mean something and not just a bunch of Xs, Zs and Ys. So I do think that using Chinese characters instead of pinyin in certain situation is useful and more “precise”. I have also found that people in China who are in their 50s and 60s are frequently unfamiliar with Pinyin.

    Curiously, I also came to the US when I was 10 but somehow have found it impossible NOT to still be able to read Chinese characters.

  43. Don’t get me wrong, I like pinyin, but wouldn’t the Chinese language get very confusing if written purely with pinyin? However you want to rant Mark, there are many more homonyms in Chinese than in English… You would end up losing meaning.

    I find, as a learner of Chinese, that the characters help me tie down a memory about particular words, whereas I just get overload with all the pinyin combinations. They are too similar, and don’t have enough differences to help with remembering. If Chinese was going to be written in a latin script I think they might as well come up with a way of incorporating the tone into the latin spelling, rather than relying on diacritical marks which are problematic on the web.

    This is all pretty academic anyway. China are not going to change their system of writing anytime soon. As Carl commented above, Chinese characters are “ideographs” and whether we like it or not, Chinese people think in characters, it is central to their language. We probably just need to get over ourselves.

  44. Pingback: The Character of Languages | Like Them that Dream

  45. Wow. So after reading through a few of the sample readings, then suffering through this blog post, I’ve finally discerned the motive behind this site. Apparently this guy hates characters, either he doesn’t understand them, or is just so lazy he wants everyone to use the Roman alphabet and Arabic numbers.

    Fact is, that even as a relatively new learner to Chinese, I find the characters much much easier to understand than pinyin. Sure, pinyin may help a new learner with pronunciation, but that is not the sole purpose of writing… Meaning should be inherent as well, which is why, in English, most homonyms are spelled differently, so we can discern the meaning, not by context, but by what we see in writing, I.e. By versus buy

  46. “To those others who still hang on to the belief that Mandarin speech somehow represents characters: how then is it that Chinese children learn to speak before they learn to read and write?”

    Isn’t the same true for MOST children? I’m just wondering if anyone has ever met a child who learned to write before speaking. That’s a miserably inadequate argument.

  47. “I can’t see myself getting this worked up over it. A frustrated Guoyu teacher in Taiwan who hates all the time he is forced to “waste” teaching his students their characters, fine”

    This is a bit of a confusing and misleading statement, given that they don’t use pinyin (or much romanization stall) in Taiwan.

  48. I should do a bit of research on the failures of the Hangul system in South Korea; they apparently have not been completely able to ditch Hanja for some reason, while North Korea is on a pure Hangul system. Vietnam may or may not have the same problems as South Korea in moving off a sinogram-based orthography, but I’m sure for political reasons any complaints would be suppressed after more than a millennium of Chinese colonialism.

    I also wonder if there’s an element of linguistic co-evolution in phonetic, syllabary-using, and hieroglyphic languages where the form of the orthography affects the evolution of the spoken language.

    For example, when you speak about being able to determine words through context; is this a desirable ambiguity? Or for clarity’s sake, would people prefer words with no ambiguous homophones? Naturally and not really, in certain pronunciations, sound extremely similar, but denote completely different meanings. In this situation people would opt to avoid usage of naturally and not really as a single-phrase reply and it would affect linguistic evolution.

