Zhou Youguang on politics

The New York Times has just published a profile of Zhou Youguang, who is often called “the father of Pinyin” (though he modestly prefers to stress that others worked with him): A Chinese Voice of Dissent That Took Its Time.

This profile focuses not only on Zhou’s role in the creation of Hanyu Pinyin but also on his political views, which he has become increasingly public with.

About Mao, he said in an interview: “I deny he did any good.” About the 1989 Tiananmen Square massacre: “I am sure one day justice will be done.” About popular support for the Communist Party: “The people have no freedom to express themselves, so we cannot know.”

As for fostering creativity in the Communist system, Mr. Zhou had this to say, in a 2010 book of essays: “Inventions are flowers that grow out of the soil of freedom. Innovation and invention don’t grow out of the government’s orders.”

No sooner had the first batch of copies been printed than the book was banned in China.

Although the reporter’s assertion, following the PRC’s official figures, that “China all but stamp[ed] out illiteracy” is well wide of the mark, there is no denying Pinyin’s crucial role in this area. I recommend reading the whole article.

Zhou Youguang

Pinyin sort order

The standard for alphabetically sorting Hanyu Pinyin is given in the ABC dictionary series edited by John DeFrancis and issued by the University of Hawaii Press.

Here’s the basic idea:

The ordering is primarily simply alphabetical. Diacritical marks, punctuation, juncture and capitalization are only taken into account when the strings being compared are otherwise identical. For example, píng’?n sorts before p?ny?n, because pingan sorts before pinyin, because g precedes y alphabetically.

Only when two strings are alphabetically identical is non-alphabetical information taken into account.

The series’ Reader’s Guide presents the specifics of the sort order. Since I don’t have to worry about how much space this takes up on my site, I have reformatted the information slightly to give the examples as numbered lists.

Head entry transcriptions with the same sequence of letters are ordered first strictly by letter sequence regardless of tones, then by initial syllable tone in the sequence 0 1 2 3 4. For entries with the same initial tone, arrangement is by the tone of the second syllable, again in the order 0 1 2 3 4. For example:

  1. sh?shi
  2. sh?sh?
  3. sh?shí
  4. sh?sh?
  5. sh?shì
  6. shísh?
  7. shíshì
  8. sh?sh?
  9. shìsh?

Irrespective of tones, entries with the vowel u precede those with ü.
For example:

  1. l?
  2. l?
  3. l?
  4. l?
  1. n?

Entries without apostrophe precede those with apostrophe. For example:

  1. biànargue
  2. b?’ànthe other shore

Lower-case entries precede upper-case entries. For example:

  1. hòujìnaftereffect
  2. Hòu JìnLater Jin dynasty

For entries with identical spelling, including tones, arrangement is by order of frequency….

For most users, the most important thing to note is that the neutral tone is regarded as 0, not as 5. Thus, the order is not? á ? à a,” but “a ? á ? à.” And, because lowercase comes before uppercase, notA a ? ? Á á ? ? À à” but “a A ? ? á Á ? ? à À.

One can see this in action in the A entries for the ABC English-Chinese, Chinese-English Dictionary. And here are some sample pages from an earlier ABC dictionary.

The ABC series follows the example of the Hanyu Pinyin Cihui (?????? / Hàny? P?ny?n Cíhuì) (example), with only one minor difference, as noted by Tom Bishop:

HPC [Hanyu Pinyin Cihui] gave hyphens and spaces the same priority as apostrophes, so that lìg?ng sorted before l?-g?ng, in spite of the tones. Usage of hyphens and spaces in pinyin is still far from being fully standardized. (The same is true in English orthography.) Consequently, for collation it makes sense to give less weight to hyphens and spaces, and more weight to tones, thus sorting l?-g?ng before lìg?ng. In ABC, hyphens and spaces don’t affect the sort order unless they change the pronunciation in the same way that apostrophe would; for example, 1míng-àn ?? and 2míng’àn ?? are treated as homophones, and they sort after m?ng?n ??.

Pinyin font: Linux Biolinum

The highly useful and Pinyin-friendly Linux Libertine has a companion font family: Linux Biolinum.

Biolinum is designed for emphasis, e.g. of titles. You can also use it for short passages of text. For longer texts a serif font such as the Libertine should be used for readability. The Biolinum has the same vertical metrics and visual weight as the Libertine, so that it fits perfectly to the Libertine and can be also used for emphasizing within the body text.

Linux Biolinum Capitals and Linux Biolinum Keyboard don’t presently work with Pinyin. But the other styles do, as this sample of Linux Biolinum with Pinyin text shows.

Some fonts *not* to use for Pinyin

One of the traditions in advance of Chinese New Year is housecleaning — something not among my favorite activities. But I thought I’d do a bit of housecleaning of half-finished posts and get at least one up before the new year (tomorrow). So here it is.

Although I occasionally bemoan the fact that relatively few font families are made such that they can handle Hanyu Pinyin with tone marks (at least not right out of the box), it’s worth noting that some of the commonly found fonts that do cover all of the letters and diacritics really suck at it and should be avoided when writing in Pinyin.

Typically, such fonts were designed mainly with Hanzi in mind.

Here’s one example:
screenshot of a Pinyin text set in Adobe Ming -- and, boy oh boy, is it ever hideous

Hideous.

That was Adobe Ming. Yes, Adobe.

I’ll go ahead and point out the obvious problems:

And I’m not so sure about the consistency of the x-height either. Those stubby little descenders are puzzling, too, but are not necessarily wrong.

Perhaps the designers intended these letters for use in vertically aligned text — though I don’t think these forms would work well even then. Perhaps there’s some context in which these might make sense, though I’m inclined to doubt this. Perhaps the designers have an irrational hatred of romanization and wanted to make Pinyin look as ugly as possible. Whatever the reason, even though this and the other Unicode-compliant fonts below have all of the letters with diacritics that Pinyin requires, using them for Pinyin texts would be a very bad idea.

Since there is apparently still some confusion about why the “?” form (in contrast to the normal “a” form) is incorrect, see the chart below.

table showing that the fonts discussed in this post that use the rounded style for the letter 'a' do so only with diacritics, not elsewhere. This is wrong. The rounded a's should not be used at all.

Note how the odd form of the letter a does not appear in regular text or even in double-width forms; instead, it’s seen only when accompanied by a tone mark. In other words, even within individual fonts the ? form is treated not as a normal “a” that happens to look that way but as something specifically for Pinyin, which is flat-out wrong. Other than the addition of diacritics themselves, there is no reason to alter letter shapes in any way for Pinyin.

Let’s get back to the broader issue. Here are some more examples of fonts that render Pinyin in ugly ways. (Click image to view PDF.)

click to view PDF with much larger and clearer text

To aid Web searches, here’s a text list of the fonts above, none of which should be used for Hanyu Pinyin:

  • Adobe Fangsong Std
  • Adobe Heiti Std
  • Adobe Kaiti Std
  • Adobe Ming Std
  • Adobe Song Std
  • MS Gothic
  • MS Mincho
  • MS PGothic
  • MS PMincho
  • MS UI Gothic
  • NSimSun
  • SimHei
  • SimSun

SimSun is probably the least awful of the bunch. But even so, there’s no good reason to use it instead of something else that would do the job much better, such as Gentium:
screenshot of the same Pinyin text, but this one is set in Gentium -- and it looks great

Generally speaking, if you wouldn’t want to use a font for English, French, Italian, etc., then don’t use it for Hanyu Pinyin.

Say no to making Pinyin ugly!

I wish you all a happy and P?ny?n-rich year of the dragon.

Pinyin font: Noticia Text

Since my last examination of the selection at Google Web Fonts the number of font families for Latin Extended has reached 98 [edit: May 31, 2012: 188], with one new face capable of rendering Hanyu Pinyin with tone marks: Noticia Text.

image showing the font Noticia Text in action on a Hanyu Pinyin sample text

Here are the Pinyin-friendly font faces at Google Web Fonts.

Serif

  • EB Garamond
  • Gentium Basic
  • Gentium Book Basic
  • Neuton
  • Noticia Text

Sans Serif

  • Andika
  • Ubuntu
  • Ubuntu Condensed
  • Ubuntu Mono

For future reference, the font most recently added to the Latin Extended group is Ruda [edit in May 2012: Chau Philomene One], which doesn’t support Pinyin with diacritics (except, perhaps, through combining diacritics).

See also

The Bible Code

I recently finished the compelling book cover of the book Fireproof Moth: A Missionary in Taiwans White TerrorFireproof Moth: A Missionary in Taiwan’s White Terror, by Milo Thornberry, who secretly helped democracy advocate Peng Ming-min escape Taiwan (and thus also possible assassination by the KMT) in the bad old days (being, in this case, early 1970). Soon thereafter, Thornberry and his wife, now Judith Thomas, became the first missionary couple to be deported from Taiwan since the Japanese era — though not for their assistance in the escape of Peng, which the authorities did not uncover. Neither did Washington or Beijing.

For that matter, even though the authorities had people assigned to watch Peng night and day, they did not know for weeks that he had slipped away. Here’s how Peng relates this:

My successful escape had stirred up a hornet’s nest. Senior government officers were certain that I could not be in Sweden because their records, the reports of their subordinates, showed that I had been traveling here and there in Formosa until the very day the news of my escape became known, almost three weeks after I left my house in Taipei. According to these reports, I had been staying in the best hotels, eating at expensive restaurants, and enjoying the cinema. The proof in their hands were the police bills charged against the special account for my surveillance.

Then the truth became evident. During the months in which I had so often secluded myself for long periods, and probably since I was released from prison in 1965, and during the weeks after I had left the island, my guards, the Investigation Bureau agents, and the police had been submitting falsified accounts, false expense vouchers and claims, and pocketing the money.

Of course that, delicious as it is, has nothing to do with the usual subjects of this blog. So here’s my excuse for bringing this up. In the following passage Thornberry describes the scene at the airport as he is awaiting deportation.

It took half an hour for four people to go through our few bags. They went through everything — jars of cold cream, tubes of toothpaste, and every piece of paper in my briefcase.

A difficulty arose when they found some sermons I had in my briefcase. Since they were written in Romanized Chinese, no one there could read them. They assumed they were some kind of secret code. So I spent several minutes with one of the men, reading the sermons to him and a couple of others who were looking on. I pointed to the words as I read. Finally, they decided that they were what I said they were and allowed me to put them back into my briefcase. I felt a certain irony as I preached to one of my guards in my last moments in Taiwan.

Pinyin fonts at the Open Font Library

A search for Pinyin fonts at the Open Font Library currently yields 15 font families.

Not all of those, however, really do support Hanyu Pinyin with tone marks. Here are the ones that work, though not always without problems:

And here’s a PDF of all of those Unicode Pinyin font families in action.

I’ve previously mentioned more than one of these: Pecita and the various Gentium faces. I’ll write more about the latter in another post on the work coming out of SIL.

Serif

screenshot of the serif font 'crimson' in action on a sample Pinyin text

screenshot of the serif fonts 'Gentium' and 'Gentium Book' in action on a sample Pinyin text

screenshot of the serif font 'Judson' in action on a sample Pinyin text

screenshot of the serif font 'Libertinage' in action on a sample Pinyin text

screenshot of the serif font 'Wirewyrm' in action on a sample Pinyin text

Sans Serif

screenshot of the sans-serif font 'Designosaur' in action on a sample Pinyin text

screenshot of the sans-serif font 'News Cycle' in action on a sample Pinyin text

screenshot of the sans-serif  font 'Pfennig' in action on a sample Pinyin text

Monospace

screenshot of the monospace sans-serif font 'Consola Mono' in action on a sample Pinyin text

Script

screenshot of the script font 'Pecita' in action on a sample Pinyin text

Full list (including fails), for future reference:

  1. Anahi/Abbey
  2. Consola Mono
  3. Crimson
  4. Designosaur
  5. Douar Outline
  6. Futhark Adapted
  7. Gentium
  8. Judson
  9. Libertinage
  10. Logisoso
  11. News Cycle
  12. Pecita
  13. Pfennig
  14. Vegesignes
  15. WireWyrm

Google Web fonts and Pinyin — December 2011 update

When I put up my first post on Google Web fonts (Google Web fonts and Hanyu Pinyin), that site offered 252 font families, 29 of which cover at least parts of Latin Extended. Now, some three months later, the total has grown to 342 font families, with 70 of those covering at least parts of Latin Extended.

Only two of the new families, however, support Hanyu Pinyin with tone marks: Ubuntu Condensed and Ubuntu Mono. That brings the total to eight Google Web fonts that support Hanyu Pinyin: four serifs and four sans serifs.

Serif

  • EB Garamond
  • Gentium Basic
  • Gentium Book Basic
  • Neuton

Sans Serif

  • Andika
  • Ubuntu
  • Ubuntu Condensed
  • Ubuntu Mono

Here’s what the two new families, Ubuntu Condensed and Ubuntu Mono, look like next to the earlier Ubuntu.

example of Ubuntu, Ubuntu Condensed, and Ubuntu Mono in action on Hanyu Pinyin

For reference, here’s the total list of Latin Extended, with Pinyin-compliant fonts in bold.

Serif Faces

  1. Bitter
  2. Cardo
  3. Caudex
  4. EB Garamond
  5. Enriqueta
  6. Gentium Basic
  7. Gentium Book Basic
  8. Neuton
  9. Playfair Display
  10. Radley
  11. Sorts Mill Goudy

Sans Serif Faces

  1. Andika
  2. Anonymous Pro
  3. Anton
  4. Chango
  5. Didact Gothic
  6. Francois One
  7. Fresca
  8. Istok Web
  9. Jockey One
  10. Jura
  11. Marmelad
  12. Open Sans Condensed
  13. Open Sans
  14. Play
  15. Signika Negative
  16. Signika
  17. Tenor Sans
  18. Ubuntu
  19. Ubuntu Condensed
  20. Ubuntu Mono
  21. Varela
  22. Viga

Display Faces (all fail)

  1. Abril Fatface
  2. Arbutus
  3. Bubblegum Sans
  4. Butcherman Caps
  5. Chicle
  6. Eater Caps
  7. Forum
  8. Kelly Slab
  9. Knewave
  10. Lobster
  11. MedievalSharp
  12. Modern Antiqua
  13. Nosifer Caps
  14. Piedra
  15. Passion One
  16. Plaster
  17. Rammetto One
  18. Ribeye Marrow
  19. Ribeye
  20. Righteous
  21. Ruslan Display
  22. Stint Ultra Condensed

Handwriting Faces (all fail)

  1. Aguafina Script
  2. Aladin
  3. Devonshire
  4. Dr Sugiyama
  5. Fondamento
  6. Herr Von Muellerhoff
  7. Marck Script
  8. Miss Fajardos
  9. Miss Saint Delafield
  10. Monsieur La Doulaise
  11. Mr Bedford
  12. Mr Dafoe
  13. Mr De Haviland
  14. Mrs Sheppards
  15. Niconne
  16. Patrick Hand