How to add tone marks to Pinyin automatically, sort of

PInyin text without and with tone marks

There are plenty of ways to type Hanyu Pinyin with tone marks. These usually involve typing the tone number after the vowel in question or entering a series of special keystrokes to produce the tone mark.

But some consider that too much mafan, or perhaps are unsure of which tones are correct. (Heads up, students learning Mandarin! This post will be useful.) So occasionally I’m asked this question:

Is there a way to type in Hanyu Pinyin and have the correct tone marks appear automatically — even without typing tone numbers or pressing additional keys? Oh, and for free too, please.

The answer is a qualified yes.

Google Translate’s Pinyin function has come a long way since its inauspicious beginning about eight years ago. For quite some time it has even offered a way to add tone marks automatically, though few people know of this function, which could still use a great deal of improvement.

To get Google Translate to produce Pinyin with tone marks as you enter text in toneless Pinyin, first you need to set the system to translate from “Chinese” to “Chinese (Traditional)” or from “Chinese” to “Chinese (Simplified)”.

Enter your text in the box and Pinyin with tone marks will appear below the box on the right.

(Click any image to enlarge it.)

Alas, there are some problems with the system.

A lot of perfectly normal things that are essential to proper writing in Hanyu Pinyin will cause Google Translate to break. So when adding your text, do not use any of the following:

  • capital letters
  • the letter ü (use “v” instead)
  • more than 160 characters (including spaces and punctuation) at a time

Up to 160 characters is fine

Image showing how Google Translate will produce Hanyu Pinyin with tone marks for texts of up to 160 characters

But more than 160 characters will break the function that adds tone marks to Pinyin

The following are optional in terms of getting Google Translate to give you good results, though they are not optional in properly written Pinyin:

  • apostrophes
  • spaces
  • punctuation

A second significant problem is that the system doesn’t deal well with proper nouns, failing both word parsing and capitalization, though at least it seems to recognize that proper nouns are units, even if Google Translate doesn’t write them correctly. sample showing how Google Translate fails to capitalize and parse Tian'anmen and Mao Zedong, producing tian'anmen and maozedong instead.

So although Google Translate won’t handle everything for you, it can nevertheless be a useful tool for including tone marks in Hanyu Pinyin.

Zhou Youguang, 1906-2017

Zhou Youguang

Zhou Youguang, who is often called the “father of Hanyu Pinyin,” died earlier today.

He lived to the age of 111. He was “the man God forgot,” he liked to joke. And he did like to laugh. His sense of humor, which he kept despite some of the trials he suffered, no doubt helped him flourish so long.

He was most remarkable, however, not for his longevity but for his monumental contribution to literacy, his dedication to helping others, and his sense of justice.

I’ll add more information later.

RIP.

Taipei to spend NT$300 million making MRT signage worse

Taipei MRT station
Commonwealth Magazine (Tiānxià zázhì) recently interviewed me for a Mandarin-language piece related to the signage on Taipei’s MRT system.

As anyone who has looked at Pinyin News more than a couple of times over the years should be able to guess, I had a lot to say about that — most of which understandably didn’t make it into the article. For example, I recall making liberal use of the word “bèn” (“stupid”) to describe the situation and the city’s approach. But the reporter — Yen Pei-hua (Yán Pèihuá / 嚴珮華), who is perhaps Taiwan’s top business journalist — diplomatically omitted that.

Since the article discusses the nicknumbering system Taipei is determined to implement “for the foreigners,” even though most foreigners are at best indifferent to this, but doesn’t include my remarks on it, I’ll refer you to my post on this from last year: Taipei MRT moves to adopt nicknumbering system. Back then, though, I didn’t know the staggering amount of money the city is going to spend on screwing up the MRT system’s signs: NT$300 million (about US$10 million)! The main reason given for this is the sports event Taipei will host next summer. That’s supposed to last for about ten days, which would put the cost for the signs alone at about US$1 million per day.

On the other hand, the city does not plan to fix the real problems with the Taipei MRT’s station names, specifically the lack of apostrophes in what should be written Qili’an (not Qilian), Da’an (not Daan) (twice!), Jing’an (not Jingan), and Yong’an (not Yongan) — in Chinese characters: 唭哩岸, 大安, 景安, and 永安, respectively. And then there’s the problem of wordy English names.

Well, take a look and comment — here, or better still, on the Facebook page. (Links below.) I’m grateful to Ms. Yen and Commonwealth for discussing the issue.

References:

Aiyo! OED fails to use Pinyin for some new entries

The Oxford English Dictionary has just added some new entries, including several from Sinitic languages.

A lot of these come by way of Singapore and so reflect the Hokkien language. For example, among the new entries is “ang pow,” which is Hokkien’s equivalent of Mandarin’s “hongbao,” which also made the list.

A few of the entries, however, come from Mandarin, for example two common interjections for surprise. Oddly, though, the OED uses “aiyoh” and “aiyah” instead of their proper Pinyin spellings of “aiyo” and “aiya.”

“Ah,” you say, “but maybe the aiyoh and aiyah spellings are more common in English.”

Nope.

Even in Singapore domains (.sg), the Pinyin spellings are more common than those the OED calls for. As the tables below show, in every instance the Pinyin spellings are also more common in Hong Kong, China, and Taiwan. Throughout the world, the Pinyin spellings are more common — the vast majority of the time by a factor of at least two.

Google search results for “aiyo” (Pinyin) and “aiyoh” (spelling used in the OED)

  aiyo aiyoh
.sg 12,200 5,680
.hk 2,570 187
.cn 6,040 984
.tw 4,690 196
all domains 1,250,000 137,000
all domains  + “chinese” 97,700 77,100
all domains  + “mandarin” 51,800 14,100

Google search results for “aiya” (Pinyin) and “aiyah” (spelling used in the OED)

  aiya aiyah
.sg 17,600 8,310
.hk 6,400 2,360
.cn 13,200 1,860
.tw 5,910 1,710
all domains 3,370,000 332,000
all domains  + “chinese” 238,000 63,200
all domains  + “mandarin” 36,500 22,800

Searching Google Books also reveals that the Pinyin forms are more common.

In short, I do not see any good reason for the OED to have adopted ad hoc spellings rather than the Pinyin standard. They must have their reasons, but it looks like they botched this.

Shanghai considers deleting Pinyin from street signs

The Shanghai Road Administration Bureau is considering removing Hanyu Pinyin from street signs in the city.

Typically, the bureau’s division chief, Wang Weifeng, seems to be confused about the difference between Pinyin and English. He also justifies the move by claiming that larger Chinese characters would benefit Chinese citizens, ignoring the high number of people in China who are largely illiterate.

“Of course we will keep the English-Chinese traffic signs around some special areas, such as the tourism spots, CBD areas and some transport hubs,” Wang said.

A German newspaper article notes:

Ob sie die Umschrift wortwörtlich „aus dem Verkehr“ zieht, will Schanghai angeblich von einer „Umfrage“ unter „Anwohnern“ abhängig machen, ebenso vom Urteil nicht näher genannter „Experten“. Dies ist eine gängige Formulierung, wenn chinesische Regierungsstellen ihren einsamen Entscheidungen einen basisdemokratischen Anstrich geben wollen.

[Google Translate: Whether they literally “out of circulation” pulls the inscription, Shanghai will supposedly make a “survey” of “residents” depends, as of indeterminate sentence from “experts”. This is a common formulation, when Chinese authorities want to give their lonely decisions a grassroots paint.]

This is a situation all too common in Taiwan as well, such as in Taipei’s misguided move to apply nicknumbering to subway stops. “Experts” — ha!

Shanghai’s survey on Pinyin use and signage is of course in Mandarin only, with no English. The poll ends on August 30 (next week!), so add your views to that soon.

So far, public opinion seems to be largely against removing Hanyu Pinyin from signs. But that doesn’t mean this might not happen anyway. After all: Shanghai has its “experts” on the case. Heh.

If Shanghai really wanted to help the legibility of its signs, it should consider using word parsing even with text in Chinese characters. For example:

  • use 陕西 南路, not 陕西南路
  • use 斜土 路, not 斜土路
  • use 建国 西路, not 建国西路

That would also permit the use of superscript on the generic parts of names (e.g., “南路”) to save space. This could also be done with the Pinyin/English, with the Pinyin in large letters and the English “Rd” etc. in superscript.

Thanks to Michael Cannings for the tip.

sources:

Pinyin writing contest — cash prizes

This is big news. I am thrilled to help announce the Li-ching Chang Memorial Pinyin Literature Contest.

A total of more than US$13,000 will be awarded to the winners. Prizes will be given to the top three winners in each of the following categories:

  • novella
  • short story
  • essay
  • poem

You need not be a native speaker of Mandarin to enter. But keep in mind that this is a literature contest: Entries should be aimed at an audience of adult, fluent speakers of Mandarin. Entries should not be written at a level for children or those learning Mandarin.

Furthermore, entries should be composed in Hanyu Pinyin, not in Chinese characters and then converted. This is crucial, as the style associated with Chinese characters is often not compatible with Mandarin as it is spoken. So here’s a chance to let the real Mandarin language shine through in writing — and for writers to win some money.

Please spread the word around.

For further details, see the contest’s FAQ.

Pinyin.com domain changes hands for a six-figure sum

Don’t get excited. I’m still here. It’s pinyin.com, not pinyin.info, that changed hands.

Anyway, Pinyin-related domain names are “hot” these days. The pinyin.com domain name reportedly sold recently for a six-figure sum. Since the story was on a Chinese website, I suppose that’s a six-figure sum in yuan, not U.S. dollars, which would put the price somewhere between US$15,000 and US$150,000.

What irks me a bit is the story’s labeling of pinyin.com as the “real” pinyin domain name (“zhēnzhèng de ‘pīnyīn’ yùmíng”). Bah. Humbug.

If you readers ever check in one day to discover that my site has become permanently plastered with advertising, crappy content, and cutesy cartoons, it won’t be because I sold out for a mere six-figure amount in Chinese yuan. After all, a truly world-class collection of Hello Kitty memorabilia would cost me a lot more than just that. A man’s gotta maintain standards, you know.

source: Zhè cái shì zhēnzhèng de “pīnyīn” — mǐ yǒu zhōng 6 wèi mǎi yùmíng pinyin.com (这才是真正的“拼音” 米友中六位买域名 pinyin.com), eName.cn, February 20, 2016

Languages, scripts, and signs: a walk around Taipei’s Shixin University

Recently I took some trails through the mountains in Taipei and ended up at Shih Hsin University (Shìxīn Dàxué / 世新大學). Near the school are some interesting signs. Rather than giving individual posts for each of these, I’m keeping the signs together in this one, as this is better testimony to the increasing and often playful diversity of languages and scripts in Taiwan.

Cǎo Chuàn

Here’s a restaurant whose name is given in Pinyin with tone marks! That’s quite a rarity here, though I suspect we’ll be seeing more of this in the future. The name in Chinese characters (草串) can be found, much smaller, on a separate sign below.

cao_chuan

二哥の牛肉麵

Right by Cao Chuan is Èrgē de Niúròumiàn (Second Brother’s Beef Noodle Soup). Note the use of the Japanese の rather than Mandarin’s 的; this is quite common in Taiwan.

erge_de_niuroumian

芭樂ㄟ店

This store has an ㄟ, which serves as a marker of the Taiwanese language. Here, ㄟ is the equivalent of 的 — and of の.

Bālè ei diàn
bala_ei_dian

A’Woo Tea Bar

awoo_tea_bar

I couldn’t find a name in Chinese characters for this place. The name is probably onomatopoeia, as in “Werewolves of London — awoo!”