One of the myths about Chinese characters is that for each character there is One True Way and One True Way Only for it to be written, with a specific number of specific strokes in a certain specific and invariable order. Generally speaking, characters are indeed taught with standard stroke orders with certain numbers of strokes (the patterns help make it less difficult to remember how characters are written) — but these can vary from place to place, though the characters may look the same. Moreover, people often write characters in their own fashion, though they may not always be aware of this.
Michael Kaplan of Microsoft recently examined the stroke data from standards bodies in China for all 70,195 “ideographs” [sic] in Unicode 5.0 and compared it against “the 54,195 ideographs for which stroke count data was provided by Taiwan standards bodies” to see how how much of a difference there was in the stroke counts for the characters that both sides provided data for.
(I’m a bit surprised the two sides have compiled such extensive lists, and I’d love to see them. But that’s another matter.)
He found that 9,768 of these characters (18 percent) have different stroke counts between the two standards, with 9,045 characters differing by 1 stroke, 675 characters by 2 strokes, 44 characters by 3 strokes, 2 characters by 4 strokes, 1 character by 5 strokes, and 1 character by 6 strokes.
Note: This is about stroke counts of matching characters, not about differing stroke counts for traditional and “simplified” characters — e.g., not 國 (11 strokes) vs 国 (8 strokes).
So, is this a case of chabuduoism, or of truly differing standards? The answer is not yet fully clear; but be sure to read Kaplan’s post and the comments there.
sources and additional info:
- How bad does it need to be in order to be not good enough, anyway?, Sorting It All Out, November 22, 2007
- Chángyòng guózì biāozhǔn zìtǐ bǐshùn shǒucè (常用國字標準字體筆順手冊), standard stroke-order handbook for commonly used Chinese characters, Taiwan Ministry of Education
Pingback: links for 2007-12-03 | bent
Interesting. I just looked at one character: U+25F22 and tried to see how it could possibly be the number of strokes assigned to it by either system (20 for Taiwan and 16 for China). I was able to get 16 by cutting some corners, but there was no way I could come up with 20. Either I’m doing something wrong, or this is a case of “chabuduoism”.
I was able to come up with 19 strokes by dividing the character 𥼢 into the following separate pieces:
米 on top (6 strokes)
十 two times (4 strokes)
申 in the middle (5 strokes)
木 on the bottom (4 strokes)
Is there anyway to squeeze one more stroke out of this?
It seems unlikely that anyone would actually write the character this way, rather than using a single vertical line down the middle. (On the other hand, how the character is written may be an open question. I wonder how many times this character has ever been handwritten by human beings alive today.)
Zev: 19 was the most I could come up with as well. 16 comes from not splitting the horizontal bar between the two ?, and from drawing one single line down the middle. The only thing I could think of is that they don’t properly count the top and right sides of the box in ? as one stroke, but that would be a mistake.
What you have to realise is that it is no use trying to get the correct stroke count from the character you can see in the Unicode code charts or that your font has, because the stroke counts are derived from a Taiwan reference font that has the some unfortunately muddled glyphs. Looking at the draft multi-column source glyph chart for CJK-B (IRG N1381) that has just been released for review we can see that:
U+25F22 𥼢 is 20 strokes instead of whatever you expect because its Taiwan source glyph is actually the same as U+25F52 𥽒 (under 米 plus 14 in the Kangxi Dictionary)
U+272F0 𧋰 is 19 strokes instead of the expected 13 strokes because its Taiwan source glyph is actually the same as U+27499 𧒙 (under 虫 plus 13 in the Kangxi Dictionary)
U+28F71 𨽱 is 24 strokes instead of the expected 29 strokes because its Taiwan source glyph is actually the same as U+28F70 𨽰 (under 阝 plus 21 in the Kangxi Dictionary)
I discuss the case of U+272F0 in more detail in my latest blog post.
Andrew:
Coincidentally, I was reading your blog post at the very moment your comment showed up here. Great detective work!
So, everyone, be sure to follow up by reading Andrew’s CJK-B Case Study #1 : U+272F0 and Kaplan’s response at Every character has a story #31: U+272f0 from CJK Extension B, an ideograph that proves that every rose has its thorn!.
I just noticed I left out the link to the stroke-order handbook from Taiwan’s Ministry of Education. I’ve corrected the omission.
Pingback: Pinyin news » Web site for stroke-order practice
The MSDN links are broken. Here are some alternatives:
http://archives.miloush.net/michkap/archive/2007/11/22/6462768.html
http://archives.miloush.net/michkap/archive/2007/12/03/6643180.html