Google Web fonts and Pinyin — December 2011 update

When I put up my first post on Google Web fonts (Google Web fonts and Hanyu Pinyin), that site offered 252 font families, 29 of which cover at least parts of Latin Extended. Now, some three months later, the total has grown to 342 font families, with 70 of those covering at least parts of Latin Extended.

Only two of the new families, however, support Hanyu Pinyin with tone marks: Ubuntu Condensed and Ubuntu Mono. That brings the total to eight Google Web fonts that support Hanyu Pinyin: four serifs and four sans serifs.

Serif

  • EB Garamond
  • Gentium Basic
  • Gentium Book Basic
  • Neuton

Sans Serif

  • Andika
  • Ubuntu
  • Ubuntu Condensed
  • Ubuntu Mono

Here’s what the two new families, Ubuntu Condensed and Ubuntu Mono, look like next to the earlier Ubuntu.

example of Ubuntu, Ubuntu Condensed, and Ubuntu Mono in action on Hanyu Pinyin

For reference, here’s the total list of Latin Extended, with Pinyin-compliant fonts in bold.

Serif Faces

  1. Bitter
  2. Cardo
  3. Caudex
  4. EB Garamond
  5. Enriqueta
  6. Gentium Basic
  7. Gentium Book Basic
  8. Neuton
  9. Playfair Display
  10. Radley
  11. Sorts Mill Goudy

Sans Serif Faces

  1. Andika
  2. Anonymous Pro
  3. Anton
  4. Chango
  5. Didact Gothic
  6. Francois One
  7. Fresca
  8. Istok Web
  9. Jockey One
  10. Jura
  11. Marmelad
  12. Open Sans Condensed
  13. Open Sans
  14. Play
  15. Signika Negative
  16. Signika
  17. Tenor Sans
  18. Ubuntu
  19. Ubuntu Condensed
  20. Ubuntu Mono
  21. Varela
  22. Viga

Display Faces (all fail)

  1. Abril Fatface
  2. Arbutus
  3. Bubblegum Sans
  4. Butcherman Caps
  5. Chicle
  6. Eater Caps
  7. Forum
  8. Kelly Slab
  9. Knewave
  10. Lobster
  11. MedievalSharp
  12. Modern Antiqua
  13. Nosifer Caps
  14. Piedra
  15. Passion One
  16. Plaster
  17. Rammetto One
  18. Ribeye Marrow
  19. Ribeye
  20. Righteous
  21. Ruslan Display
  22. Stint Ultra Condensed

Handwriting Faces (all fail)

  1. Aguafina Script
  2. Aladin
  3. Devonshire
  4. Dr Sugiyama
  5. Fondamento
  6. Herr Von Muellerhoff
  7. Marck Script
  8. Miss Fajardos
  9. Miss Saint Delafield
  10. Monsieur La Doulaise
  11. Mr Bedford
  12. Mr Dafoe
  13. Mr De Haviland
  14. Mrs Sheppards
  15. Niconne
  16. Patrick Hand

Google improves its maps of Taiwan

Two years ago when Google switched to Hanyu Pinyin in its maps of Taiwan, it did a poor job … despite the welcome use of tone marks.

Here are some of the problems I noted at the time:

  • The Hanyu Pinyin is given as Bro Ken Syl La Bles. (Terrible! Also, this is a new style for Google Maps. Street names in Tongyong were styled properly: e.g., Minsheng, not Min Sheng.)
  • The names of MRT stations remain incorrectly presented. For example, what is referred to in all MRT stations and on all MRT maps as “NTU Hospital” is instead referred to in broken Pinyin as “Tái Dà Y? Yuàn” (in proper Pinyin this would be Tái-Dà Y?yuàn); and “Xindian City Hall” (or “Office” — bleah) is marked as X?n Diàn Shì G?ng Su? (in proper Pinyin: “X?ndiàn Shìg?ngsu?” or perhaps “X?ndiàn Shì G?ngsu?“). Most but not all MRT stations were already this incorrect way (in Hanyu Pinyin rather than Tongyong) in Google Maps.
  • Errors in romanization point to sloppy conversions. For example, an MRT station in Banqiao is labeled X?n Bù rather than as X?np?. (? is one of those many Chinese characters with multiple Mandarin pronunciations.)
  • Tongyong Pinyin is still used in the names of most cities and townships (e.g., Banciao, not Banqiao).

I’m pleased to report that Google Maps has recently made substantial improvements.

First, and of fundamental importance, word parsing has finally been implemented for the most part. No more Bro Ken Syl La Bles. Hallelujah!

Here’s what this section of a map of Tainan looked like two years ago:

And here’s how it is now:

Oddly, “Jiànx?ng Jr High School” has been changed to “Tainan Municipal Chien-Shing Jr High School Library” — which is wordy, misleading (library?), and in bastardized Wade-Giles (misspelled bastardized Wade-Giles, at that). And “Girl High School” still hasn’t been corrected to “Girls’ High School”. (We’ll also see that problem in the maps for Taipei.)

But for the most part things are much better, including — at last! — a correct apostrophe: Y?u’ài St.

As these examples from Taipei show, the apostrophe isn’t just a one-off. Someone finally got this right.

Rén’ài, not Renai.
screenshot from Google Maps, showing how the correct Rén'ài (rather than the incorrect Renai) is used

Cháng’?n, not Changan.
screenshot from Google Maps, showing how the correct Cháng'?n is used

Well, for the most part right. Here we have the correct Dà’?n (and correct Ruì’?n) but also the incorrect Daan and Ta-An. But at least the street names are correct.
click for larger screenshot from Google Maps, showing how the correct Dà'?n (and correct Ruì'?n) is used but also the incorrect Daan and Ta-An

Second, MRT station names have been fixed … mostly. Most all MRT station names are now in the mixture of romanization and English that Taipei uses, with Google Maps also unfortunately following even the incorrect ones. A lot of this was fixed long ago. The stops along the relatively new Luzhou line, however, are all written wrong, as one long string of Pinyin.

To match the style used for other stations, this should be MRT Songjiang Nanjing, not Jieyunsongjiangnanjing.
screenshot from Google Maps, showing how the Songjiang-Nanjing MRT station is labeled 'Jieyunsongjiangnanjing Station' (with tone marks)

Third, misreadings of poyinzi (pòy?nzì/???) have largely been corrected.

Chéngd?, not Chéng D?u.
screenshot from Google Maps, showing how the correct 'Chéngd? Rd' is used

Like I said: have largely been corrected. Here we have the correct Chéngd? and Chóngqìng (rather than the previous maps’ Chéng D?u and Zhòng Qìng) but also the incorrect Houbu instead of the correct Houpu.
screenshot from Google Maps, showing how the correct Chóngqìng Rd and Chéngd? St are used but also how the incorrect Houbu (instead of Houpu) is shown

But at least the major ones are correct.

Unfortunately, the fourth point I raised two years ago (Tongyong Pinyin instead of Hanyu Pinyin at the district and city levels) has still not been addressed. So Google is still providing Tongyong Pinyin rather than the official Hanyu Pinyin at some levels. Most of the names in this map, for example, are distinctly in Tongyong Pinyin (e.g., Lujhou, Sinjhuang, and Banciao, rather than Luzhou, Xinzhuang, and Banqiao).

Google did go in and change the labels on some places from city to district when Taiwan revised their names; but, oddly enough, the company didn’t fix the romanization at the same time. But with any luck we won’t have to wait so long before Google finally takes care of that too.

Or perhaps we’ll have a new president who will revive Tongyong Pinyin and Google will throw out all its good work.

Google Translate’s Pinyin converter: now with apostrophes

Google has taken another major step toward making Google Translate‘s Pinyin converter decent. Finally, apostrophes.

Not long ago “??????????????” would have yielded “??rb?níy? ránér rénài lián?u p??r chá.” But now Google produces the correct “?’?rb?níy? rán’ér rén’ài lián’?u p?’?r chá.” (Well, one could debate whether that last one should be p?’?r chá, p?’?rchá, P?’?r chá, P?’?r Chá, or P?’?rchá. But the apostrophe is undoubtedly correct regardless.)

Also, the -men suffix is now solid with words (e.g., ??? –> péngyoumen and ??? –> háizimen). This is a small thing but nonetheless welcome.

The most significant remaining fundamental problem is the capitalization and parsing of proper nouns.

And numbers are still wrong, with everything being written separately. For example, “???????????????” should be rendered as “q?qi?n ji?b?i sìshís?n wàn w?qi?n liùb?i w?shíb?.” But Google is still giving this as “q? qi?n ji? b?i sì shí s?n wàn w? qi?n liù b?i w? shí b?.”

On the other hand, Google is starting to deal with “le”, with it being appended to verbs. This is a relatively tricky thing to get right, so I’m not surprised Google doesn’t have the details down yet.

So there’s still a lot of work to be done. But at least progress is being made in areas of fundamental importance. I’m heartened by the progress.

Related posts:

The current state:
screen shot of what Google Translate's Pinyin converter produces as of late September 2011

Script font for Pinyin

Unfortunately, relatively few fonts support Hanyu Pinyin (with tone marks, that is). So I was surprised to come across Pecita, by Philippe Cochy. This is the first script typeface I recall seeing that covers Pinyin … and a lot more.

It might be too individualistic for much Pinyin use. But I’m very glad to know it exists and hope to see many more creations like it.

GIF of Pecita in action: A-Z, a-z, plus the diacritics used in Pinyin and a pinyin pangram

Pecita is licensed under the SIL Open Font License, Version 1.1.

Additional links:

Google Web fonts and Hanyu Pinyin

Back in the last century, getting Web browsers to correctly display Pinyin was such a troublesome task that I remember once even employing GIFs of first- and third-tone letters to get those to look right. So there were a whole lotta IMG tags in my text. Sure, I put the necessary info in ALT tags (e.g., “alt=’a3′”), just in case. But, still, I shudder to recall having to resort to that particular hack.

Things are better now, though still far from ideal. Something that promises to considerably improve the situation of website viewers not all having the same font you may wish to use is CSS3’s @font-face, which allows those creating Web pages to employ fonts that are provided online. Google is helping with this through its Google Web Fonts. (Current count: 252 font families.)

But is anything in Google’s collection capable of dealing with Hanyu Pinyin? Armed with a handy-dandy Pinyin pangram, I had a look at what Google has made available.

Not surprisingly, most of the 29 font families marked as offering the “Latin Extended” character set failed to handle the entire Hanyu Pinyin set. The ??? group is the most likely to be unsupported at present, with third-tone vowels also frequently missing.

Here are the Google Web fonts that do support Hanyu Pinyin with tone marks:
Serifs

  • EB Garamond (227 KB)
  • Gentium Basic (263 KB — and about the same for each of the three accompanying styles: italic, bold, bold italic)
  • Gentium Book Basic (267 KB — and about the same for each of the three accompanying styles: italic, bold, bold italic)
  • Neuton (56 KB — and about the same for each of the five accompanying styles: italic, bold, light, extra light, extra bold)

screenshot of the Pinyin fonts above

Note:

  • Neuton has relatively weak tone marks, so I wouldn’t recommend it for Web pages aimed at beginning students of Mandarin.

Sans Serifs

  • Andika (1.4 MB)
  • Ubuntu (350 KB) — available in eight styles

screenshot of the Pinyin fonts above

Some Ubuntu sample PDFs: Ubuntu regular, Ubuntu italic, Ubuntu bold, Ubuntu bold italic, Ubuntu light, Ubuntu light italic, Ubuntu medium, Ubuntu medium italic.

Andika sample PDF.

Note:

  • Andika’s relatively large size (1.4 MB) makes it unsuitable for @font-face use because of download time. (Its license, however, would permit someone with the time and energy to crack it open and remove lots of the glyphs not needed for Pinyin, thus reducing the size.) More fundamentally, though, I don’t much like the look of it; but YMMV.

Since Google is likely to expand the number of fonts it offers, I’m including the list of all 29 faces I tried for this experiment, which should make it easier for those wanting to test only new fonts. (It is possible, however, that Pinyin support will be added later to some fonts that fail in this area now. If anyone hears of any such changes, please let me know.) Use of bold indicates Pinyin support; everything else failed.

Display Faces with Latin Extended (all fail)

  • Abril Fatface
  • Forum
  • Kelly Slab
  • Lobster
  • MedievalSharp
  • Modern Antiqua
  • Ruslan Display
  • Tenor Sans

Handwriting Faces with Latin Extended (all fail)

  • Patrick Hand

Serif Faces with Latin Extended

  • Cardo
  • Caudex
  • EB Garamond
  • Gentium Basic
  • Gentium Book Basic
  • Neuton
  • Playfair Display
  • Sorts Mill Goudy

Sans Serif Faces with Latin Extended

  • Andika
  • Anonymous Pro
  • Anton
  • Didact Gothic
  • Francois One
  • Istok Web
  • Jura
  • Open Sans
  • Open Sans Condensed
  • Play
  • Ubuntu
  • Varela

Additional resource: SIL Fonts for downloading (including the full versions of Andika and Gentium).

Taiwanese romanization used for Hanzi input method

Since I just posted about the new Hakka-based Chinese character input method I would be amiss not to note as well the release early this year of a different Chinese character input method based on Taiwanese romanization.

This one is available in Windows, Mac, and Linux flavors.

See the FAQ and documents below for more information (Mandarin only).

Táiw?n M?nnány? Hànzì sh?rùf? 2.0 b?n xiàzài (?????????? 2.0???) [Readers may wish to note the use of Minnan, which is generally preferred among unificationists and some advocates of Hakka and the languages of Taiwan's tribes.]

source: Jiàoyùbù Táiw?n M?nnány? Hànzì sh?rùf? (?????????????); Ministry of Education, Taiwan; June 16, 2010(?) / February 14, 2011(?) [Perhaps the Windows and Linux versions came first, with the Mac version following in 2011.]

Hakka romanization used for new Hanzi input method

Chinese character associated with Hakka morpheme ng?iTaiwan’s Ministry of Education has released software for Windows and Linux systems that uses Hakka romanization for the inputting of Chinese characters.

This appears to be aimed mainly at those who wish to input Hanzi used primarily in writing Hakka, such as that shown here.

See also Taiwanese romanization used for Hanzi input method.

sources:

Key Chinese updated, adding new Pinyin features

The program Key, which offers probably the best support for Hanyu Pinyin of any software and thus deserves praise for this alone, has just come out with an update with even more Pinyin features: Key 5.2 (build: August 21, 2011 — earlier builds of 5.2 do not offer all the latest features).

Those of you who already have the program should get the update, as it’s free. But note that if you update from the site, the installer will ask you to uninstall your current version prior to putting in the update, so make sure you have your validation code handy or you’ll end up with no version at all.

(If you don’t already have Key, I recommend that you try it out. A 30-day free trial version can be downloaded from the site.)

Anyway, here’s some of what the latest version offers:

  • Hanzi-with-Pinyin horizontal layout gets preserved when copied into MS Word documents (RT setting), as well as in .html and .pdf files created from such documents.
  • Pinyin Proofing (PP) assistance: with pinyin text displayed, pressing the PP button on the toolbar will colour the background of ambiguous pinyin passages blue; right-clicking on such a blue-background pinyin passage will display the available options.
  • Copy Special: a highlighted Chinese character passage can be copied & pasted automatically in various permutations.
  • Improved number-measureword system: it now works with Chinese-character, pinyin and Arabic numerals.
  • Showing different tones through coloured characters (Language menu under Preferences).
  • Chengyu (fixed four character expression) spacing logic: automatic spacing according to the pinyin standard (Language menu under Preferences).
  • Option to show tone sandhi on grey background (Language menu under Preferences).
  • Full support of standard pinyin orthography in capitalization and spacing.
  • Automatic glossary building.

Some programs, such as Popup Chinese’s “Chinese converter,” will take Chinese characters and then produce pinyin-annotated versions, with the Pinyin appearing on mouseover. Key, however, offers something extra: the ability to produce Hanzi-annotated orthographically correct Pinyin texts (i.e,, the reverse of the above). If you have a text in Key in Chinese characters, all you have to do is go to File --> Export to get Key to save your text in HTML format.

Here’s a sample of what this looks like.

B?n bi?ozh?n gu?dìngle yòng? Zh?ngwén p?ny?n f?ng’àn? p?nxi? xiàndài Hàny? de gu?zé? Nèiróng b?okuò f?ncí liánxi? f?? chéngy? p?nxi?f?? wàiláicí p?nxi?f?? rénmíng dìmíng p?nxi?f?? bi?odiào f?? yíháng gu?zé d?ng?

Basically, this is a “digraphia export” feature — terrific!

If you want something like the above, you do not have to convert the Hanzi to orthographically correct Pinyin first; Key will do it for you automatically. (I hope, though, that they’ll fix those double-width punctuation marks one of these days.)

Let’s say, though, that you want a document with properly word-parsed interlinear Hanzi and Pinyin. Key will do this too. To do this, a input a Hanzi text in Key, then highlight the text (CTRL + A) and choose Format --> Hanzi with Pinyin / Kanji-Kana with Romaji.

In the window that pops up, choose Hanzi with Pinyin / Kanji-kana with Romaji / Hangul with Romanization from the Two-Line Mode section and Show all non-Hanzi symbols in Pinyin line from Options. The results will look something like this:

GIF of a screenshot from Key, showing an interlinear text with word-parsed Pinyin above Chinese characters. This is an image of the text after being pasted into Microsoft Word.

This can be extremely useful for those authoring teaching materials.

Furthermore, such interlinear texts can be copied and pasted into Word. For the interlinear-formatted copy-and-paste into Word to work properly, Key must be set to rich text format, so before selecting the text you wish to use click on the button labeled RT. (Note yellow-highlighted area in the image below.)

screenshot identifying the location of the button that needs to be pressed to make the text RTF