massive Korean dictionary of Chinese characters nears completion

The final volumes in what is being touted as the world’s largest Chinese character dictionary are scheduled to be published in May.

The fifteen-volume work (excluding the index) will reportedly cover some 60,000 Chinese characters and include about 500,000 Sinitic words. By comparison, the Zhongwen da cidian (中文大辭典 / Zhōngwén dà cídiǎn), published in Taiwan in the 1960s covers 49,905 Chinese characters.

The project was initiated by the Institute of Oriental Studies of Dankook University, South Korea, in 1978.

The first volume of the 『漢韓大辭典 』 (in Mandarin: Hàn-Hán dà cídiǎn; “Dictionary of Chinese characters Korean use,” as it is translated on the institute’s Web site) was issued in 1999. Last year, volumes 10-12 were published.

The project has reportedly cost more than W20 billion (US$21.3 million).

Yet more work may still be needed.

Prof. Kim Eon-jong of the Department of Korean Literature in Classical Chinese at Korea University said, “This project has great significance from the standpoint of cultural history. But it’s a pity that the institute hastened the final stage. It must complement and supplement the dictionary later.”

sources:

wanted: linguistically interesting Taiwan campaign material

Taiwan’s new method for electing legislators (with one directly elected legislator per relatively small district instead of many legislators for large districts) means that areas no longer have an enormous variety of campaign signs on display. So I don’t get to see nearly as many signs as during previous elections.

Outside my neighborhood I’ve seen some signs with zhuyin (usually there for writing something in Taiwanese). But I haven’t been able to get any photos of these or other such signs. So I’m hoping that others might send in some photos, if you see anything interesting.

I’m specifically looking for:

  • signs with zhuyin (bopomofo) or romanization
  • signs using languages other than Mandarin (e.g., Taiwanese, English)
  • signs using puns, esp. if the puns involve more than one language
  • anything else linguistically interesting

Please e-mail me your finds. (I promise to try to get them online quicker than with my usual six-month delay.) Or add comments here pointing me toward examples you’ve already put online or seen elsewhere.

Some examples in previous posts:

English + Chinese characters for Cantonese: Number 1!

Andy Lau being presented with the calligraphy scroll discussed in this postJoel of Danwei has posted about an interesting calligraphy scroll presented to Hong Kong superstar Andy Lau.

The characters read “You Are No. 1!”

That’s not a translation: the Cantonese pronunciation of the characters 腰呀冧吧温! (”yiu a nam ba wan!”) approximates the English sentence.

I just love stuff like this.

Read in Mandarin this is just gibberish, especially the character .

Read the whole post for details.

The technique also recalls the cover of Visible Speech, by John DeFrancis, which renders part of the Gettysburg Address phonetically in various scripts, some more closely than others (see the bottom line for Chinese characters with Mandarin pronunciations):
'four score and seven years ago' in lots of different scripts

source: If you can read this, you’re Number One!, Danwei, December 6, 2007

Compensation for kanji-input basic technology subject of lawsuit

A Japanese man who says he invented the technology behind the context-based conversion of a sentence written solely in kana into one in both kanji and kana, as well as another related technology, filed suit against Toshiba on December 7, seeking some US$2.3 million in compensation from his former employer.

Shinya Amano, a professor at Shonan Institute of Technology, said in a written complaint that although the firm received patents for the technologies in conjunction with him and three others and paid him tens of thousands of yen annually in remuneration, he actually developed the technologies alone.

Amano is claiming 10 percent of an estimated ¥2.6 billion in profit Toshiba made in 1996 and 1997 — much higher than the roughly ¥230,000 he was actually awarded for the work over the two-year span.

His claim is believed valid, taking into account the statute of limitations and the terms of the patents.

“This is not about the sum of the money — I filed the suit for my honor,” Amano said in a press conference after bringing the case to the Tokyo District Court.

“Japan is a technology-oriented country, but engineers are treated too lightly here,” he said.

Toshiba said through its public relations office that it believes it paid Amano fair compensation in line with company policy. The company declined to comment on the lawsuit before receiving the complaint in writing.

Amano claims that he invented the technology that converts a sentence composed of kana alone into a sentence composed of both kanji and kana by assessing its context, and another technology needed to prioritize kanji previously used in such conversions.

Using theories of artificial intelligence, the two technologies developed in 1977 and 1978 are still used today in most Japanese word-processing software, he said.

source: Word-processor inventor sues Toshiba over redress, Kyodo News, via Japan Times, December 9, 2007

Du Ponceau online

Pinyin Info has long made available a selection from Peter S. Du Ponceau’s groundbreaking work on the nature of Chinese characters.

Google Print now offers the complete text of this book: A dissertation on the nature and character of the Chinese system of writing. (Also available as a 14.3 MB PDF.) This was first published in 1838; and if more people had paid attention to it the ideographic myth might well have perished then instead of flourishing to continue to plague us today.

Here is a fuller version of the title:

A Dissertation on the Nature and Character of the Chinese System of Writing, in a Letter to John Vaughan, esq … To which are Subjoined, a Vocabulary of the Cochinchinese Language by Father Joseph Morrone, R.C. Missionary at Saigon, … And a Cochinchinese and Latin Dictionary, in Use Among the R.C. Missions in Cochinchina.

Du Ponceau also did important work on some Native American languages and served as president of the American Philosophical Society for seventeen years. That organization continues its tradition of inducting distinguished members (MS Word document).

Alternate versions of Du Ponceau’s name: Peter Stephen DuPonceau and Pierre-Etienne Du Ponceau.

stret-sgn

I don’t bother with typos much, but this street sign stood out enough that I wanted to share it with everyone. I took this photo last weekend in Jiaoxi, Yilan County, a town on Taiwan’s east coast that is known for its hot springs (wēnquán). (Nice hiking there, too.) Taiwan’s official signage used to be rife with just this sort of sloppiness; the situation has improved somewhat this decade.

street sign reading '湯圍街 Tng-wi Rd.'

This should be “Tangwei St.” (Tāngwéi Jiē), not “Tng-wi Rd.”

I don’t know how old that sign is. Perhaps it dates from the MPS2 era. I saw only a few more street signs in Jiaoxi, and they were in Tongyong Pinyin, such as this one for what in Hanyu Pinyin would be Wēnquán Lù (Wenquan Road /溫泉路)

two steet signs atop one pole: one reading 'To Train Station', the other 'Wuncyuan Rd'
The strokes in the roman letters are a bit too thin for this sort of use.

stroke counts: Taiwan vs. China

One of the myths about Chinese characters is that for each character there is One True Way and One True Way Only for it to be written, with a specific number of specific strokes in a certain specific and invariable order. Generally speaking, characters are indeed taught with standard stroke orders with certain numbers of strokes (the patterns help make it less difficult to remember how characters are written) — but these can vary from place to place, though the characters may look the same. Moreover, people often write characters in their own fashion, though they may not always be aware of this.

Michael Kaplan of Microsoft recently examined the stroke data from standards bodies in China for all 70,195 “ideographs” [sic] in Unicode 5.0 and compared it against “the 54,195 ideographs for which stroke count data was provided by Taiwan standards bodies” to see how how much of a difference there was in the stroke counts for the characters that both sides provided data for.

(I’m a bit surprised the two sides have compiled such extensive lists, and I’d love to see them. But that’s another matter.)

He found that 9,768 of these characters (18 percent) have different stroke counts between the two standards, with 9,045 characters differing by 1 stroke, 675 characters by 2 strokes, 44 characters by 3 strokes, 2 characters by 4 strokes, 1 character by 5 strokes, and 1 character by 6 strokes.

Note: This is about stroke counts of matching characters, not about differing stroke counts for traditional and “simplified” characters — e.g., not 國 (11 strokes) vs 国 (8 strokes).

So, is this a case of chabuduoism, or of truly differing standards? The answer is not yet fully clear; but be sure to read Kaplan’s post and the comments there.

sources and additional info:

interviews with Y.R. Chao

I’ve just stumbled across a book-length series of interviews with Y.R. Chao (Zhao Yuanren / Zhào Yuánrèn / 趙元任 / 赵元任). Even better: The complete text is available for free on the Web!

China Scholars Series: Chinese linguist, phonologist, composer and author, Yuen Ren Chao. An Interview Conducted by Rosemany Levenson, with an introduction by Mary Haas.

Wow. This is absolutely fabulous. The Bancroft Library of the University of California, Berkeley, deserves praise for this. Other works of interest to readers of Pinyin News are also available; but more about those later, in separate posts.

In case any readers are not familiar with Chao (1892-1982), he was the finest linguist ever to come out of China. He was also a supporter of romanization; he was even the lead creator of an ingenious if somewhat complicated romanization system for Mandarin: Gwoyeu Romatzyh. But there’s no way a few short sentences could do justice to the depth and breadth of Chao’s learning. To get a better idea of the man, read the introduction to the work linked to above — and then read the rest!

Enjoy!

Further reading: Y.R. Chao’s translation into Gwoyeu Romatzyh of the Humpty Dumpty section of Through the Looking-Glass, with Hanyu Pinyin and English