Saint Joe’s

A Catholic church in Jinlun (Jīnlún/金崙), Taidong, Taiwan. Note the absence of Chinese characters.

photo of a church, with 'KIOKAI NI' and 'SANTO YOSEF' written on it in large letters

The town of Jinlun being in an area with many members of the Paiwan tribe, I checked with a Chen Chun-Mei (Chén Chūnměi / 陳春美), a Paiwan specialist at Guólì Zhōngxīng Dàxué (National Chung Hsing University / 國立中興大學), who wrote that kiokai is one of many words Paiwan borrowed from Japanese (kyōkai/教会: meaning church), and that ni in Paiwan means of or by.

So this is the Church of Saint Joseph.

I was also interested to hear on the train to Jinlun that some of the announcements in advance of some stations in Taidong County were in not only Mandarin, Taiwanese, and English, but also an aboriginal language. I’m guessing Paiwan. Even in the announcements in that language, however, the place names themselves sounded like they were given in Mandarin forms, though the descriptions were not.

Further reading:

Zhou Youguang on NPR

Louisa Lim had a story on National Public Radio yesterday about Zhou Youguang (周有光 / Zhōu Yǒuguāng), who’s often referred to as the father of Pinyin.

Most stories in the mass media about him focus on just two things, which might be summarized as “pinyin” and “wow, he’s really old.” This story, however, draws welcome notice to some some other things about him, as the title reveals: At 105, Chinese Linguist Now A Government Critic. (There’s a link to the audio version near the top of the page. Zhou can be heard in the background speaking Mandarin — though his English is excellent.)

The article also provides a link to his blog: Bǎisuì xuérén Zhōu Yǒuguāng de bókè (百岁学人周有光的博客).

Further reading:

Hat tip to John Rohsenow.

photo of Zhou Youguang signing a book for me

Taimali signage examples

Tai Fong Rd. ???Here are some signs in Taimali Taimali (Tàimálǐ / 太麻里), Taidong County, Taiwan. In all cases of distinctive spellings, they’re in Tongyong Pinyin, even though they should have been replaced by Hanyu Pinyin years ago. When the change to Tongyong Pinyin was implemented, however, signs under national control (e.g., highway signs) were switched relatively quickly throughout the country. This, however, has not been the case with the switch to Hanyu Pinyin, especially in the south.

Note that the “Taimali” in the sign for the Taimali Railway Station is on a sticker rather than on the original sign. This is a bit odd, given that this is spelled exactly the same in all of the romanization systems commonly seen in Taiwan: Hanyu Pinyin, Tongyong Pinyin, MPS2, and bastardized Wade-Giles. So maybe what’s under the sticker was just an error. Taiwan’s signs certainly have their share of typos too. (Sometimes the authorities will even use a sticker to “correct” the right spelling with something else.)

Click any of the images below for a larger version.

two signs reading Taimali Railway Station ?????? / Jinjhen Mountain ???

closeup of two signs reading Taimali Railway Station ?????? / Jinjhen Mountain ???

directional highway signs reading ?? Jhiben / ??? Dawu

street signs reading ??? Rih Sheng Rd. / ??? Min Cyuan Rd.

street signs reading ??? Rih Sheng Rd. / ??? Tai Fong Rd.

shot of the Taimali Railway Station, showing jinzhen flowers drying on the road

Now on Pinyin.info: Weishenme Zhongwen zheme TM nan?

Earlier this year a Mandarin translation of David Moser’s classic essay Why Chinese Is So Damn Hard appeared on the Web. And then it disappeared. With the permission of both the translator and the original author, I’m placing this work back online.

It’s available here in two versions:

Enjoy!

Maybe I’ll make a Pinyin version too one of these years.

Google Translate’s Pinyin converter: now with apostrophes

Google has taken another major step toward making Google Translate‘s Pinyin converter decent. Finally, apostrophes.

Not long ago “阿爾巴尼亞然而仁愛蓮藕普洱茶” would have yielded “Āěrbāníyǎ ránér rénài liánǒu pǔěr chá.” But now Google produces the correct “Ā’ěrbāníyǎ rán’ér rén’ài lián’ǒu pǔ’ěr chá.” (Well, one could debate whether that last one should be pǔ’ěr chá, pǔ’ěrchá, Pǔ’ěr chá, Pǔ’ěr Chá, or Pǔ’ěrchá. But the apostrophe is undoubtedly correct regardless.)

Also, the -men suffix is now solid with words (e.g., 朋友們 –> péngyoumen and 孩子們 –> háizimen). This is a small thing but nonetheless welcome.

The most significant remaining fundamental problem is the capitalization and parsing of proper nouns.

And numbers are still wrong, with everything being written separately. For example, “七千九百四十三萬五千六百五十八” should be rendered as “qīqiān jiǔbǎi sìshísān wàn wǔqiān liùbǎi wǔshíbā.” But Google is still giving this as “qī qiān jiǔ bǎi sì shí sān wàn wǔ qiān liù bǎi wǔ shí bā.”

On the other hand, Google is starting to deal with “le”, with it being appended to verbs. This is a relatively tricky thing to get right, so I’m not surprised Google doesn’t have the details down yet.

So there’s still a lot of work to be done. But at least progress is being made in areas of fundamental importance. I’m heartened by the progress.

Related posts:

The current state:
screen shot of what Google Translate's Pinyin converter produces as of late September 2011

Kindles and Pinyin

Sure, Amazon Kindles can store thousands of books, play mp3 files, provide Web access, and allow one to spend money on books with alarming ease. But can they handle Pinyin?

photo of a Kindle 3 displaying the opening of 'Muqin Chujia' -- showing that all tone marks appear correctly

Yes!

This test was made on a Kindle 3 purchased at a U.S. retail store. All three typefaces — regular, condensed, and sans serif — worked well.

Yes, Kindles can display Hanzi as well — though there may be some problems with those appearing correctly in book titles in the device’s index.

Below are links to my files, in case you want to test this yourself. I’d appreciate hearing about how Nook and other devices handle this. Thanks.

Script font for Pinyin

Unfortunately, relatively few fonts support Hanyu Pinyin (with tone marks, that is). So I was surprised to come across Pecita, by Philippe Cochy. This is the first script typeface I recall seeing that covers Pinyin … and a lot more.

It might be too individualistic for much Pinyin use. But I’m very glad to know it exists and hope to see many more creations like it.

GIF of Pecita in action: A-Z, a-z, plus the diacritics used in Pinyin and a pinyin pangram

Pecita is licensed under the SIL Open Font License, Version 1.1.

Additional links:

Google Web fonts and Hanyu Pinyin

Back in the last century, getting Web browsers to correctly display Pinyin was such a troublesome task that I remember once even employing GIFs of first- and third-tone letters to get those to look right. So there were a whole lotta IMG tags in my text. Sure, I put the necessary info in ALT tags (e.g., “alt=’a3′”), just in case. But, still, I shudder to recall having to resort to that particular hack.

Things are better now, though still far from ideal. Something that promises to considerably improve the situation of website viewers not all having the same font you may wish to use is CSS3’s @font-face, which allows those creating Web pages to employ fonts that are provided online. Google is helping with this through its Google Web Fonts. (Current count: 252 font families.)

But is anything in Google’s collection capable of dealing with Hanyu Pinyin? Armed with a handy-dandy Pinyin pangram, I had a look at what Google has made available.

Not surprisingly, most of the 29 font families marked as offering the “Latin Extended” character set failed to handle the entire Hanyu Pinyin set. The ??? group is the most likely to be unsupported at present, with third-tone vowels also frequently missing.

Here are the Google Web fonts that do support Hanyu Pinyin with tone marks:
Serifs

  • EB Garamond (227 KB)
  • Gentium Basic (263 KB — and about the same for each of the three accompanying styles: italic, bold, bold italic)
  • Gentium Book Basic (267 KB — and about the same for each of the three accompanying styles: italic, bold, bold italic)
  • Neuton (56 KB — and about the same for each of the five accompanying styles: italic, bold, light, extra light, extra bold)

screenshot of the Pinyin fonts above

Note:

  • Neuton has relatively weak tone marks, so I wouldn’t recommend it for Web pages aimed at beginning students of Mandarin.

Sans Serifs

  • Andika (1.4 MB)
  • Ubuntu (350 KB) — available in eight styles

screenshot of the Pinyin fonts above

Some Ubuntu sample PDFs: Ubuntu regular, Ubuntu italic, Ubuntu bold, Ubuntu bold italic, Ubuntu light, Ubuntu light italic, Ubuntu medium, Ubuntu medium italic.

Andika sample PDF.

Note:

  • Andika’s relatively large size (1.4 MB) makes it unsuitable for @font-face use because of download time. (Its license, however, would permit someone with the time and energy to crack it open and remove lots of the glyphs not needed for Pinyin, thus reducing the size.) More fundamentally, though, I don’t much like the look of it; but YMMV.

Since Google is likely to expand the number of fonts it offers, I’m including the list of all 29 faces I tried for this experiment, which should make it easier for those wanting to test only new fonts. (It is possible, however, that Pinyin support will be added later to some fonts that fail in this area now. If anyone hears of any such changes, please let me know.) Use of bold indicates Pinyin support; everything else failed.

Display Faces with Latin Extended (all fail)

  • Abril Fatface
  • Forum
  • Kelly Slab
  • Lobster
  • MedievalSharp
  • Modern Antiqua
  • Ruslan Display
  • Tenor Sans

Handwriting Faces with Latin Extended (all fail)

  • Patrick Hand

Serif Faces with Latin Extended

  • Cardo
  • Caudex
  • EB Garamond
  • Gentium Basic
  • Gentium Book Basic
  • Neuton
  • Playfair Display
  • Sorts Mill Goudy

Sans Serif Faces with Latin Extended

  • Andika
  • Anonymous Pro
  • Anton
  • Didact Gothic
  • Francois One
  • Istok Web
  • Jura
  • Open Sans
  • Open Sans Condensed
  • Play
  • Ubuntu
  • Varela

Additional resource: SIL Fonts for downloading (including the full versions of Andika and Gentium).