Unicode tops other encodings on Web pages: Google

Google is reporting that in December 2007 Unicode became the most frequently used encoding on Web pages.

Just last December there was an interesting milestone on the web. For the first time, we found that Unicode was the most frequent encoding found on web pages, overtaking both ASCII and Western European encodings—and by coincidence, within 10 days of one another. What’s more impressive than simply overtaking them is the speed with which this happened.

Here’s Google’s graph:
graph showing percentage of pages in various encodings, 2001-2008, with ASCII starting about 56% in 2001 and declining to about 25% now, which is also about where iso-8859-1 and utf-8 are now

I wish Big5, the encoding most used for Web pages in traditional Chinese characters, had been included in the graph. And I suspect that it’s only within the past ten years — perhaps even within the timeframe of the graph — that more Web pages have been encoded in GB (used for so-called simplified Chinese characters) than Big5. (GB is shown on the graph in green.)

Of course, many (most?) Web pages don’t declare any character encoding. This is especially bad when they contain characters beyond the bounds of ASCII, since those characters will often end up rendered as garbage on systems different than that of the creator of the Web page.

So … should I have a post focusing on Unicode without again berating the Unicode Consortium for its continuing unscientific, egregious, and unforgivable use of ideographic? I don’t think so.

source: Moving to Unicode 5.1, Official Google Blog, May 5, 2008

Crazy English in the New Yorker

The latest issue (April 28, 2008) of the New Yorker has an article on the China’s Crazy English (F?ngkuáng Y?ngy? / ????) method: Crazy English: The national scramble to learn a new language before the Olympics, by Evan Osnos.

Li Yang Crazy English (as it is properly known, after Li Yang, the company’s founder, chief spokesman, and head cheerleader) uses untraditional and emphatic but not always proven methods, including shouting and vowel-associated gesticulations, to help students overcome their fear of using English and remember the sounds of their vocabulary words.

Chinese nationalism is also a big part of its approach.

From the article:

A long red-carpeted catwalk sliced through the center of the crowd. After a series of preppy warmup teachers, firecrackers rent the air and Li bounded onstage. He carried a cordless microphone, and paced back and forth on the catwalk, shoulder height to the seated crowd staring up at him.

“One-sixth of the world’s population speaks Chinese. Why are we studying English?” he asked. He turned and gestured to a row of foreign teachers seated behind him and said, “Because we pity them for not being able to speak Chinese!” The crowd roared.

Li professes little love for the West. His populist image benefits from the fact that he didn’t learn his skills as a rich student overseas; this makes him a more plausible model for ordinary citizens. In his writings and his speeches, Li often invokes the West as a cautionary tale of a superpower gone awry. “America, England, Japan—they don’t want China to be big and powerful!” a passage on the Crazy English home page declares. “What they want most is for China’s youth to have long hair, wear bizarre clothes, drink soda, listen to Western music, have no fighting spirit, love pleasure and comfort! The more China’s youth degenerates, the happier they are!” Recently, he used a language lesson on his blog to describe American eating habits and highlighted a new vocabulary term: “morbid obesity.”

Li’s real power, though, derives from a genuinely inspiring axiom, one that he embodies: the gap between the English-speaking world and the non-English-speaking world is so profound that any act of hard work or sacrifice is worth the effort. He pleads with students “to love losing face.” In a video for middle- and high-school students, he said, “You have to make a lot of mistakes. You have to be laughed at by a lot of people. But that doesn’t matter, because your future is totally different from other people’s futures.”

Very soon Sino-Platonic Papers will be issuing a long, critical study of Crazy English. Look for the announcement of that here in Pinyin News.

further reading:

Pinyin in space

Stories about the official approval last September of the name of “Chiayi” for an asteroid/planetoid/minor planet (not to be confused with Pluto, the “dwarf planet“) discovered by astronomers with Taiwan’s National Central University drew my attention to the fact that another minor planet already bears the name of the university — and that they named it using Tongyong Pinyin: “Jhongda” (i.e., Zh?ng-Dà, the short form of the school’s name in Mandarin, Guólì Zh?ngy?ng Dàxué).

There are plenty of planetoids bearing names in Hanyu Pinyin, e.g. Chongqing, Guangzhou, Guizhou, Beijingdaxue [i.e., Beijing Daxue], Beishida [i.e., Bei-Shi-Da], and Zirankexuejijin [i.e., Ziran Kexue Jijin].

Omitting spaces is common in the names as a whole, though some of them have spaces. And some have hyphens.

Although the statistics of diacritical characters in minor planets’ names (a list after my own heart) shows that, as of June 1997, 667 (4.83%) of the 13,805 named minor planets had diacritical characters in their names, I didn’t spot any Hanyu Pinyin names with tone marks. The mark for first tone doesn’t appear on the list even once.

I wish they’d followed Tongyong when naming asteroid Chiayi, because that way they would have ended up with the same spelling that Hanyu Pinyin uses: Jiayi. But I guess the solar system’s big enough for Wade-Giles as well.

Here are some Google search figures from Taiwan government domains.

  • 532 from gov.tw domains for “chia-i”
  • 1,380 from gov.tw domains for “jiayi”
  • 2,660 from gov.tw domains for “chia-yi”
  • 997,000 from gov.tw domains for “chiayi”

Should Ma Ying-jeou win next month’s presidential election in Taiwan, both the executive and legislative branches of government would be in the hands of the no-longer-opposed-to-Hanyu-Pinyin Kuomintang, and the national folly of Tongyong Pinyin could soon cease to exist as an official system not just in Taiwan but everywhere throughout the known universe … except on planetoid no. 145534 (“Jhongda”), a big chunk of rock in orbit somewhere past Mars.


Unusual Venezuelan names and the law

China has attempted to block personal and now place names with unusual Chinese characters and even prevent names using a perfectly usual Roman letter and numbers, but authorities in Venezuela are trying to limit personal names themselves to just 100(!), with exemptions for Indians and foreigners, according to an article in the New York Times.

Goodbye, Tutankamen del Sol.

So long, Hengelberth, Maolenin, Kerbert Krishnamerk, Githanjaly, Yornaichel, Nixon and Yurbiladyberth. The prolifically inventive world of Venezuelan baby names may be coming to an end.

If electoral officials here get their way, a bill introduced last week would prohibit Venezuelan parents from bestowing those names — and many, many others — on their children.

The measure would not be retroactive. But it would limit parents of newborns to a list of 100 names established by the government, with exemptions for Indians and foreigners, and it is already facing skepticism in the halls of the National Assembly….

The bill’s ambition, according to a draft submitted to municipal offices here for review, is to “preserve the equilibrium and integral development of the child” by preventing parents from giving newborns names that expose them to ridicule or are “extravagant or hard to pronounce in the official language,” Spanish.

The bill also aims to prevent names that “generate doubts” about the bearer’s gender….

Not everyone denounces the bill. Temutchin del Espíritu Santo Rojas Fernández, 25, a computer programmer, explained that his first name was inspired by the birth name of Genghis Khan, often spelled Temujin in English. He said he frequently had to correct the spelling of his name on official documents.

And in Venezuela, where the tax authorities require name and national identity number for every purchase needing a receipt, pronouncing and spelling out Temutchin del Espíritu Santo can get tiring, Mr. Rojas Fernández said. “With a name this complicated, you lose time,” he said.

“It also creates social problems,” he continued. “When interacting with others, not everyone can pronounce your name. I have to pronounce my name five times and spell it twice.”

source: A Culture of Naming That Even a Law May Not Tame, New York Times, September 4, 2007

Chabuduo jiu keyi?

When it comes to signage and much else in Taiwan, the phrase chàbudu? jiù k?y? (??????) might qualify as the country’s unofficial motto. “Close enough for government work” is probably the best idiomatic translation.

The railway-station sign in this photo in many ways exemplifies this.

Hsinchu Jhubei Shiangshan

Rather than list all of the errors and oddities of this sign, I thought I’d let readers have a go at this one. How many errors and problematic points can you find?

Grace Lee — the name, the movie

Korean-American filmmaker Grace Lee has made a movie about her own very common name, those who share it with her, and what cultural implications it may have, both in the West and Asia.

Here is the opening of one reviewer’s description of the film:

Smartly counterprogrammed opposite the orientalized depictions of Asian femininity in Memoirs of a Geisha, The Grace Lee Project is a breezy first-person video essay that goes in search of the average Asian American woman, all the while wondering if there is in fact such a thing. Early in her documentary, filmmaker Grace Lee points out that almost everyone knows a Grace Lee, and what’s more, is inclined to describe her the same way: nice, intelligent, quiet, sweet, studious, sort of forgettable. (Oh, and plays the violin.) Even G.L.’s often think of other G.L.’s—and of themselves—in those non- descript terms. Intrigued and disconcerted by the oppressive commonness of her name—and even more so by the perceived attributes that cling to it—Lee sets out to humanize the sociocultural abstraction and statistical mean that is “Grace Lee.”

Although the film premiered in late 2005 and received good reviews, it is not yet commercially available on DVD.

Courage… Cabnap… Grunplitk: zhuyin and the movie Fearless

Many Westerners are so attracted by Chinese characters, which tend to be absurdly exoticized as symbols [sic] or ideograms [sic] of deep meaning, that they place them here and there as if they were some sort of pixie dust that bestows coolness upon any object (or body). Often when they do so, they write these characters incorrectly or are mistaken about their meaning, as Tian of Hanzi Smatter continues to note. But you’d think that at least those who make trailers for Chinese movies would be a little better informed.

Fearless (Mandarin title: Huò Yuánji? / ???), which is billed as Jet Li’s final martial-arts movie, has been out in Asia since January but won’t reach the States until later this year. (I have no plans to see this movie, which appears from the trailer to be a string of the usual clichés. And, anyway, I have yet to forgive Jet Li for appearing in Hero, which is probably the biggest cinematic valentine to totalitarianism since Triumph of the Will.) One of the trailers for Fearless features a number of Chinese characters. They’re even written correctly. But, oddly enough, interspersed with the Chinese characters are zhuyin fuhao, also known as bopo mofo, a semi-syllabic script used in Taiwan mainly to help teach children to read. Odder still, the zhuyin make absolutely no sense.

Here’s how Taiwanonymous, on whose site I found this story, puts it:

Intercut with scenes from the movie was a burnt-yellow background, suggesting aged parchment, with Chinese characters flying past. Along with the Chinese characters were some Mandarin phonetic symbols (zhuyin fuhao ????). It’s bad enough that they included phonetic symbols (which are mainly used in Children’s books) in the flying sea of what wanted to be an ancient Chinese text, but the symbols flew past in strings of gibberish! Imagine the following text dramatically moving across the screen, “Integrity… Peace… Courage… Cabnap… Grunplitk… Uwsugls.” Gives you chills just thinking about it.

Here’s a screenshot from the trailer:
gibberish zhuyin in the background

Just below COMING SOON is a giant ?. For something written in English this would be the equivalent of putting a large letter G on the screen.

Along the right side of the screen is the following, in zhuyin fuhao: ?????. This, in Hanyu Pinyin, would be “maixrici,” which is complete gibberish. The other vertical lines of text are also nonsense in zhuyin fuhao.

Again, there’s nothing wrong with how these are written. It’s just that they’re no more meaningful than a random string of letters.

Here’s one more shot:
gibberish zhuyin in the background
The zhuyin fuhao on the left read, from top to bottom, ?????, which would be “chjktp” in Hanyu Pinyin. As I think should be obvious even to those who don’t know Mandarin or any other Sinitic language, this is simply nonsense.