Sign in seal script

It’s time again to play What’s That Character?

Feel free to ask others what they think, though enlisting the aid of historians and calligraphy masters would count as cheating, as all of these examples are not from a museum or a calligraphy scroll but from a sign outside a building meant to be read by all.

Chinese character number one:
mystery_hanziv

Chinese character number two:
mystery_hanziii

And Chinese character number three:
mystery_hanziiv

OK? Ready?

Here are the answers:

How’d you do?

*** SPOILERS BELOW ***

If you got even one right, hái bùcuò. That’s probably as well as or better than the average person literate in Chinese characters.

Here is the entire sign, which will probably make things much clearer.

If you’re a Mandarin speaker and used to reading Chinese characters, you can probably tell what the entire sign says without too much effort. But as this exercise may help to show, that is not because most people can truly read all the characters but because they can fill in the blanks, as it were, when presented with adequate context. Yes, those are all written in seal script, not in a modern style; but seal script is all that is given on the sign.

I want to stress that this isn’t a sign for a historical museum or even the Cultural Bureau. Nope, it’s for the Xinbei City Government’s Environmental Protection Department, here in lovely Banqiao. Those used to the ways of Taiwan (or maybe just the ways of the world) have probably already correctly guessed that it was the director who thought a seal-script font would be a good idea. (See the news stories below for more on that. Although the reports are from a couple of years ago, I took the photos just a couple of weeks ago.)

Don’t forget: If you want to put Chinese characters or tonal Pinyin in your comments, use the encoder first and copy and paste the results into the comments box.

News stories:

Possible workaround for the encoding problem

Earlier this evening, as I was browsing through some old posts here on Pinyin News, I was startled to notice in one post the presence of some Chinese characters that were not scrambled but correct (e.g., here). I investigated.

The difference between that page and most others is that one employed Unicode numerical character references (NCRs) for Chinese characters and diacritics.

Entering such code appears to be a stable way of getting around the hack. Thus, for example, in order for the characters “漢字” to appear in the published post correctly despite the encoding hack into Swedish, they would need to be entered in the post’s HTML as “漢字” rather than in the more human-friendly direct form of “漢字”. The same goes for most diacritics.

So, although this isn’t a real solution, at least I should be able to make new posts here render correctly; and, no less importantly, these new posts should remain correct even after the encoding problem for the rest of the blog is finally fixed, since NCRs are ASCII-friendly and thus shouldn’t become scrambled.

This should also mean that your comments can again safely include Chinese characters, etc., as long as you use NCRs as well, which can be accomplished with relatively little hassle by employing Pinyin.info’s online tool to convert Chinese characters and Pinyin diacritics to Unicode numerical character references. Check it out.

Look, Ma, no GIFs or PNGs!

Wǒ ài Hànyǔ Pīnyīn!
我愛漢語拼音!

Dissolving Pinyin

Late last week, Victor Mair — with some assistance from Matt Anderson, David Moser, me, and others — wrote in “Lobsters”: a perplexing stop motion film about a short 1959 film from China that gives some Pinyin. In some cases, the Pinyin is presented for a second and then is quickly dissolved into Chinese characters. Since Victor’s post supplies only the text, I thought that I’d supplement that here with images from the film.

See the original post for translations and discussion.

The film often shows a newspaper. The headline (at 7:57) reads (or rather should read, since the first word is misspelled):

QICHE GUPIAO MENGDIE
DAPI LONGXIA ZHIXIAO

longxia_pinyin_757

But since the image above doesn’t show the name of the paper, I’m also offering this rotated and cropped photo, that allows us to see that this is the “JIN YUAN DIGUO RI-BAO”
longxia_ribao

Elsewhere, there are again some g’s for q’s. For the first example of text dissolving from Pinyin to Chinese characters (at 2:11), I’m offering screenshots of the text in Pinyin, the text during the dissolve, and the text in Chinese characters. Later I’ll give just the Pinyin and Chinese characters.

Hongdang Louwang
Yipi hongdang zai daogi [sic] jiudian jihui buxing guanbu [sic] louwang

longxia_pinyin_211a

longxia_pinyin_211b

longxia_pinyin_211c

Soon thereafter (at 2:44), we get a handwritten note.
longxia_pinyin_244a

longxia_pinyin_244c

At 3:39 we’re shown the printed notice in the newspaper of the above text.
longxia_pinyin_339a

longxia_pinyin_339c

A brief glance at the newspaper at 3:23 gives us FA CHOU, which is probably referring to the stink the bad lobsters are giving off.
longxia_pinyin_323

Here a man is carrying a copy of Zibenlun (Das Kapital), by Makesi (Marx).
longxia_pinyin_911

Actually, it’s not really Das Kapital, just the cover of the book; inside is a stack of decadent Western material. “MEI NE” is probably supposed to be “MEINÜ” (beautiful women).
longxia_pinyin_558_meinu

I imagine that, in the PRC of 1959, the artists for this film must have inwardly rejoiced at the chance to draw something like that for a change, and that is also why there’s a nude on the wall in one scene.
longxia_pinyin_439_meinu

UTF-8 Unicode vs. other encodings over time

Some eight years ago UTF-8 (Unicode) became the most used encoding on Web pages. At the time, though, it was used on only about 26% of Web pages, so it had a plurality but not an absolute majority.

Graph showing growth of the UTF-8 encoding

By the beginning of 2010 Unicode was rapidly approaching use on half of Web pages.
graph showing a steep rise in the use of UTF-8 and a steep decline in other major encodings

In 2012 the trends were holding up.
UTF-8_website_use_2001-2012

Note that the 2008 crossover point appears different in the latter two Google graphs, which is why I’m showing all three graphs rather than just the third.

A different source (with slightly different figures) provides us with a look at the situation up to the present, with UTF-8 now on 85% of Web pages. Expansion of UTF-8 is slowing somewhat. But that may be due largely to the continuing presence of older websites in non-Unicode encodings rather than lots of new sites going up in encodings other than UTF-8.
growth in Unicode UTF-8 encoding on Web pages, 2010-2015

Here’s the same chart, but focusing on encodings (other than UTF-8) that use Chinese characters, so the percentages are relatively low.
asian_language_encodings_2010-2015

And here’s the same as the above, but with the results for individual languages combined.
asian_language_encodings_2010-2015_by_language

By the way, Pinyin.info has been in UTF-8 since the site began way back in 2001. The reason that Chinese characters and Pinyin with tone marks appear scrambled within Pinyin News is that a hack caused the WordPress database to be set to Swedish (latin1_swedish_ci), of all things. And I haven’t been able to get it fixed; so just for the time being I’ve given up trying. One of these days….

Sources:

Pinyin font: Skarpa

Today’s Pinyin-friendly font is Skarpa, by Aga Silva of Poland. It’s a bit quirky (e.g., second-tone o’s and lowercase q’s) but still sharp.

Hanyu Pinyin pangram using the Skarpa font

Skarpa was later modified into Skarpa 2, which is not free but which comes in several weights and types.

Most of Silva’s other fonts also can handle Pinyin with tone marks. Those are all commercial rather than free.

Popularity of Chinese character country code TLDs

Yesterday we looked at the popularity of the Chinese character TLD for Singapore Internet domains. Today we’re going to examine the Chinese character ccTLDs (country code top-level domains) for those places that use Chinese characters and compare the figures with those for the respective Roman alphabet TLDs.

In other words, how, for example, does the use of taiwan in traditional Chinese characters   .台灣 domains compare with the use of .tw domains?

Since, unlike the case with Singapore, I don’t have the registration figures, I’m having to make do with Google hits, which is a different measure. For this purpose, Google is unfortunately a bit of a blunt instrument. But at least it should be a fairly evenhanded blunt instrument and will be useful in establishing baselines for later comparisons.

A few notes before we get started:

  • Japan has yet to bother with completing the process for its own name in kanji (Japan, as written in kanji / Chinese characters), so it is omitted here.
  • Macau only recently asked for aomen in simplified Chinese characters    
  .澳门 and aomen in traditional Chinese characters    
  .澳門, so those figures are still at zero.
  • Oddly enough, there’s no taiwan_super in traditional Chinese characters   
  .臺灣 ccTLD, even though the Ma administration, which was in power when Taiwan’s ccTLDs went into effect, officially prefers the more complex form of taiwan_super in traditional Chinese characters   
  .臺灣 to taiwan in traditional Chinese characters   .台灣 — not to mention prefering it to taiwan in simplified Chinese characters    
  .台湾.
  Google Hits Percent of Total
MACAU    
.mo 18400000 100.00
aomen in simplified Chinese characters    
  .澳门 0 0.00
aomen in traditional Chinese characters    
  .澳門 0 0.00
TAIWAN    
.tw 206000000 99.86
taiwan in simplified Chinese characters    
  .台湾 67600 0.03
taiwan_super in traditional Chinese characters   
  .臺灣 0 0.00
taiwan in traditional Chinese characters   .台灣 230000 0.11
HONG KONG    
.hk 193000000 99.94
xianggang  in Chinese characters 
  .香港 118000 0.06
SINGAPORE    
.sg 97800000 100.00
xinjiapo  in Chinese characters 
  .新加坡 2 0.00
CHINA    
.cn 315000000 99.61
zhongguo in simplified Chinese characters  
  .中国 973000 0.31
zhongguo in traditional Chinese characters   
  .中國 251000 0.08

So in no instance does the Chinese character ccTLD reach even one half of one percent of the total for any given place.

Here are the results in a chart.

Graph showing that although China leads in domains in Chinese characters, they do not reach even one half of one percent of the total for China

Note that the ratio of simplified:traditional forms in China and Taiwan are roughly mirror images of each other, as is perhaps to be expected.

See also Platform on Tai, Pinyin News, December 30, 2011