variant Chinese characters and Unicode

A submission to the Unicode Consortium’s Ideographic [sic] Variation Database for the “Combined registration of the Adobe-Japan1 collection and of sequences in that collection” is available for review through November 25. This submission, PRI 108, is a revision of PRI 98.

This set “enumerates 23,058 glyphs” and contains 14,664 tetragraphs (Chinese characters / kanji). About three quarters of Unicode pertains to Chinese characters.

Two sets of charts are available: the complete one (4.4 MB PDF), which shows all the submitted sequences, and the partial one (776 KB PDF), which shows “only the characters for which multiple sequences are submitted.”

Below is a more or less random sample of some of the tetragraphs.

Initially I was going to combine this announcement with a rant against Unicode’s continued misuse of the term “ideographic.” But I’ve decided to save that for a separate post.

sample image of some of the kanji variants in the proposal

names of love hotels in macho kanji and other scripts

Donald Ritchie’s recent review of Japanese Love Hotels: A Cultural History, by Sarah Chaplin, has the following interesting section:

The contemporary love hotel is now much more kawaii (cute) than kinky.

Among the the reasons offered for this is that there has been something of a power shift in love-hotel choice. It used to be the male half that decided. Back then the places had hopeful macho monikers — Empire, Rex, King. Then the female half began to choose. Love hotels started calling themselves “fashion hotels” or “boutique hotels,” and began to have lavish lobbies with theme-shops, colors like beige and lavender, and decor like Laura Ashley.

This change can be documented in the Meguro Emperor (still in Meguro), which began in 1973 as a he-man fort before it slowly metamorphosed into a romantic Disneyland castle. The interior has been several times revised to segue from male- to female-friendly. Even the name has changed. It is now Gallery Hotel.

In most love hotels “macho” kanji has been replaced by “feminine” hiragana, trendy katakana or, more often, romaji, that romanized script that carries no male/female associations at all.

source: It’s ladies first now in Japanese love hotels, Japan Times, August 26, 2007

Japanese and attitudes toward kanji

Ken of What Japan Thinks has helpfully translated into English the results of a recent poll of 1,010 Japanese adults on their attitudes about kanji ability.

A total of 95 percent of those polled said they believe the kanji ability of elementary and middle school children is “undesirably low.” Of those giving this response, 56 percent associated the problem with a drop in school education levels.

A slight majority (52 percent) of all those polled reported a lack of confidence in their own kanji ability.

Here are the questions. For the responses, see the translation or the poll results in Japanese (『漢字力』などに関する調査, Goo Research, June 27, 2007):

  • Do you feel that elementary and middle school children’s kanji ability is sufficient?
    • It’s undesirably low
      • Why do you think that?
    • It’s not a problem
      • Why do you think that?
  • Do you have confidence in your own kanji ability?
    • Yes
    • No
      • Why don’t you have confidence in your own kanji ability?
  • What do you do when you cannot produce a kanji character?

    Japanese literacy–an SPP reissue

    Here’s another re-release from the archives of Sino-Platonic Papers: Computers and Japanese Literacy: Nihonzin no Yomikaki Nôryoku to Konpyûta, by J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures. The link above is to the PDF version (1.2 MB), which reproduces the original exactly.

    This is a parallel text in Japanese (in romanization) and English, so if any of you want to practice reading romaji, here’s your chance.

    The English text alone is available in HTML: Computers and Japanese Literacy.

    The essay touches on many of themes Unger explores in depth in his books, all of which have excerpts available here on Pinyin Info: The Fifth Generation Fallacy, Literacy and Script Reform in Occupation Japan, and Ideogram: Chinese Characters and the Myth of Disembodied Meaning.

    Here is the opening, in both English and Japanese (in romanization).

    Watakusi wa saikin, gendai no konpyûta siyô to Nihongo ni tuite kenkyu site orimasu. Gengogakusya mo konpyûta no nôryoku ya mondaiten ni tuite iken o happyo suru sekinin ga aru to omou kara desu. I am currently engaged in research on contemporary computer usage and the Japanese language. Linguists too, I believe, have a responsibility to present their views on the potentials and problems of computers.
    Sate, Amerika no zen- Kôsei Kyôiku tyôkan, John Gardner-si no kotoba de hazimetai to omoimasu. Sore wa “aizyô nasi no hihan to hihan nasi no aizyô (Eigo de iu to, “unloving criticism and uncritical love”) to iu kotoba desu. Gardner-si wa, Amerikazin no aikokusyugi ni tuite Amerika o sukosi de mo hihan site wa ikenai to syutyô suru hito wa kangaetigai da, aizyô nasi ni syakai ya bunka no ketten o hihan bakari suru koto wa motiron warui keredo, hihan sore zitai o kiratte kokusuisyugi o susumeru koto mo syôrai no tame ni yoku nai, to iimasita. Kono koto wa bokoku igai no syakai to bunka ni tai suru baai de mo onazi de wa nai desyô ka? Gengogakusya ya rekisigakusya mo “aizyô nasi no hihan to hihan nasi no aizyô” to iu ryôkyokutan o sakeru yô ni sita hô ga ii to omou no desu. Watakusi wa Nihon no gengo to bunka o senmon ni site, Nihon ni tai site aizyô o motte orimasu kara koso, Nihongo no hyôkihô ya Nihonzin no yomikaki nôryoku ni tuite no teisetu o mondai ni site iru wake desu. Iwayuru zyôhôka syakai no zidai ni hairi, ippan no hitobito ga pasokon ya wâpuro o kozin-yô ni tukau yô ni naru ni turete, nettowâku tûsin, kyôiku-yô sohutowea, sôzôteki na puroguramingu nado ga yôkyû sarete kite iru desyô. Mosi sono konpon ni aru yomikaki nôryoku no henka to genzyô o gokai sureba, gôriteki na konpyûta siyôhô o kaihatu dekinai darô to omou kara desu. Let me begin by quoting the former U.S. Secretary of Health, Education, and Welfare, John Gardner. I am thinking of his phrase “unloving criticism and uncritical love.” By this, he meant that it was wrong for proponents of American patriotism to oppose even the slightest criticism of the United States: although it is bad to dwell unsympathetically on finding fault with social and cultural shortcomings, it is equally bad for the future of society to advance nationalism and eschew all criticism. I think that this is also true when considering foreign societies and cultures. Linguists and historians would do well to avoid the twin extremes of “unloving criticism and uncritical love.” As someone professionally involved with the language and culture of Japan, I have an affection for the country, but for that very reason, I wish to call into question the accepted theory of Japanese script and literacy. As we enter the age of the so-called informational society, and as more and more ordinary people begin to use computers on an individual basis, demands on network communications, educational software, creative programming, and so on, will steadily increase. Unless we understand the present situation and history of literacy, which underlies all these applications, we cannot hope to develop a rational basis for computer usage.
    Sate, hyôi mozi to iu kotoba wa Nihongo ni tuite no hon ni yoku dete imasu kara kokugogaku no yôgo da to itte mo ii hodo desu ga, hyôi mozi to iu mono wa zissai ni sonzai site iru desyô ka? Kyakkanteki ni kangaete miru to, dono gengo mo konponteki ni wa hanasu mono desu. Mozi wa syakaiteki, rekisiteki na men ga arimasu ga, mozi wa kotoba no imi no moto de wa arimasen. Tatoeba, itizi mo yomenai mômoku no hito de mo, hoka no syôgai ga nai kagiri, bokokugo ga kanzen ni hanaseru yô ni narimasu. Sitagatte, hanasi-kotoba to wa mattaku kankei ga nai mozi nado to iu mono wa muimi na gainen desu. Gengo no imi wa gengo no kôzô kara hassei si, mozi wa sono han’ei de sika nai wake desu. Kore wa toku ni kore kara no konpyûta o kangaeru toki ni wasurete wa ikemasen…. The term “ideographic characters” appears so often in books on the Japanese language that one might say it has become a stock phrase of Japanese linguistics. I wonder, however, whether such things as “ideographs” actually exist. When examined objectively, all languages are fundamentally speech. Characters are not the source of the meanings of words, although they do have their social and historical aspects. For example, blind people who cannot read a single character can nonetheless speak their native tongues perfectly, unless they suffer from some other handicap. The very idea of characters totally divorced from speech is therefore meaningless. For the meaning of language emerges from the structure of language, of which writing is merely a reflection. It is particularly important that we not forget this when we consider the computers of the future….

    This was first published in January 1988 as issue no. 6 of Sino-Platonic Papers.

    indicator of character frequency: a suggestion for programmers

    It occurred to me the other day that many people, especially language learners, might find it useful to have a tool that would take text written in Chinese characters and mark it up according to the frequency of use of the individual characters within.

    Here’s a sentence from a recent CCP rant news item that can serve as an example:

    (Fēilǘfēimǎ de “wǎng yǔ” bùzài mǎnzú yú piān’ān wǎngluò yīyú, zhèng xùnsù xiàngzhe qítā méitǐ shèntòu, yīn’ér jiājù le bàozhǐ diànshì děng wénzì yǔyán de hùnluàn, diànwū le Hànyǔ yán wénhuà de chúnjié.)

    Predictably, many of the characters here are extremely common. Others, however, would not even be covered under China’s definition of literacy. I’ve separated these characters into different classes, based on their frequencies of usage and applied different colors to each class:

    • character frequency: 1-100 (class i-c)
    • character frequency: 101-500 (class c-d)
    • character frequency: 501-1000 (class d-m)
    • character frequency: 1001-1500 (class m-md)
    • character frequency: 1501-2000 (class md-mm)
    • character frequency: beyond 2000 (class mmplus)

    So the sample sentence would look like this:


    (Those of you reading this through RSS may need to visit the site to see what I’m talking about.)

    The coding I used looks like this, though other approaches are possible:

    <span class=”c-d” title=”101-500″>非</span><span class=”mmplus” title=”2001+”>驴</span>….

    I added titles to make this more accessible.

    Perhaps adding a summary would be useful:

    1-100              24.6%
    101-500           42.1%
    501-1000          8.8%
    1001-1500         14.0%
    1501-2000          1.8%
    2001+              8.8%

    This approach could also be used for Japanese — for example, to highlight all kanji not included in the Jōyō kanji, or to highlight different sets of the Kyōiku kanji. For that matter, it could also be applied to written words in English or other languages that use alphabets, though conjugutions, plurals, and the like would complicate matters.

    So, would anyone like to try coming up with one of these? Or has it been done already?

    one possible resource:

    85 percent of Japanese report weakening of ability to write kanji: poll

    I may have understated the headline by using “weakening.” Regardless, though, the figures are dramatic.

    People are becoming accustomed to computer-aided input of kanji and thus forgetting how to be able to write them by hand. This is only going to get worse, not better.

    For a brief English article on this, see the link below.

    6月に行った調査 ではこんな結果が出ているが、最新の調査はどうだろうか。「漢字の日」である12月12日、ニンテンドー DS 用ソフト「 漢検DS 」が、漢字に関する意識調査の結果を発表した。




    漢字力が低下した原因について尋ねるたところ(複数回答)、最も多かった回答は「PC をよく使うから」で87.4%。続いて「携帯電話(携帯メール)をよく使うから」(43.8%)、「年齢をおうことによる記憶力の低下」(41.8%)の順となった。



    笹原助教授は漢字力低下の要因を、PC などの普及による漢字を書く行為のデジタル化に見出したが、その低下した漢字力を向上させるための学習方法にもデジタル化の波が及んでいるのかもしれない。