Microsoft Translator and Pinyin

screenshot of the text described in the post, as treated by Microsoft Translator

If supplied with the following,

談中國的“語”和“文”的問題,我覺得最好能先了解一下在中國通用的語言。中國的主要語言有哪些?為甚麼我說這個,而不說那個?因為環境?因為被強迫?因為我愛這個語言?因為有必要?因為這個語言很重要?也想想什麼是中國人的共同語言。用一個共同語言有必要嗎?為什麼?別的漢語的去向會怎麼樣?如果你使用中國的共同語言普通話,你了解這個語言的語法(比如“的, 得, 地“ 和“了” 的不同用法)嗎? 知道這個語言的基本音節(不包括聲調)只有408個嗎?

Microsoft Translator produces the following Hanyu Pinyin:

tán zhōngguóde “yǔ” hé “wén”dewèntí, wǒjuéde zuìhǎo néng xiānliǎojiě yì xiàzài zhōngguó tōngyòng de yǔyán。 zhōngguóde zhǔyào yǔyán yǒu nǎxiē? wèishénme wǒshuō zhège ,érbùshuōnàgè? yīnwéi huánjìng? yīnwéi bèi qiǎngpò? yīnwéi wǒài zhège yǔyán? yīnwéi yǒubìyào? yīnwéi zhège yǔyán hěnzhòngyào? yě xiǎngxiǎng shénmeshì zhōngguórén de gòngtóngyǔyán。 yòng yígè gòngtóngyǔyán yǒubìyào ma? wèishénme? biéde hànyǔ de qùxiàng huì zěnmeyàng? rúguǒnǐ shǐyòng zhōngguóde gòngtóngyǔyán pǔtōnghuà , nǐ liǎojiě zhège yǔyán de yǔfǎ ( bǐrú “de,dé, de ”hé“le” de bùtóng yòngfǎ )ma? zhīdào zhège yǔyán de jīběn yīnjié (bùbāokuòshēngtiáo) zhǐyǒu 408gèma?

This has a number of obvious problems:

  • failure to capitalize the first letter in a sentence
  • failure to capitalize proper nouns (e.g., “zhongguo” should be “Zhongguo”) (Here is how to handle proper nouns in Pinyin.)
  • frequent appending of “de” to the word before it (Here is how to handle de in Pinyin.)
  • incorrect punctuation, e.g., commas, periods, parentheses, and question marks were not converted from their double-width (i.e., Chinese character) forms to regular roman forms (“,。?()” should appear instead as “,.?()”)
  • incorrect word parsing (sometimes)

In short: Thumbs-down for now. But it might not take too much work for Microsoft to make this significantly better.

Japan to add romanization to names on My Number cards

The Japanese government has reportedly decided to add romanization for names on My Number cards, starting next year (2024). My Number cards — also known as Individual Number cards (or kojin bangō kādo / 個人番号カード) are a form of national ID.

Here’s basically what they look like now (without a space for romanization):
blank My Number card

But I haven’t been able to find any more specific information yet.

I wrote the authorities with My Number cards for clarification. I wanted to know what romanization system My Number Cards will use: Hepburn, Kunrei-shiki, or something else? Or will people be able to choose any system they want or to choose from a list of government-approved systems?

I also requested links to any articles/announcements about this in English or Japanese.

Unfortunately, the person who politely responded did not have any information about this beyond what I submitted.

Source: one small mention at the end of this article: Pronunciation of Japanese Personal Names to be Regulated by Planned Law Revision, Japan News (from the Yomiuri Shimbun), February 18, 2023.

Chunghwa, Chunghua, Zhonghua

My previous post on postage stamps with Bopomofo (Zhuyin fuhao) mentioned Taiwan’s postal service, Chunghwa Post, which is terrifically efficient at delivering mail but which made an odd choice in romanization in its English name however many years ago . The Mandarin is Zhōnghuá Yóuzhèng in Hanyu Pinyin. But the post office spells its name

Chunghwa

logo for Chunghwa Post Co., Ltd

Chung is clearly Wade-Giles. (It probably would be bastardized Wade-Giles; but in this case chung rather than ch’ung is correct – so, luck of the draw.) Yet hwa does not exist in Wade-Giles, which uses hua. So where is that hwa coming from? The only system that uses hwa and has been official in Taiwan is Gwoyeu Romatzyh.

The Yale system, devised by George Kennedy, also uses hwa; but despite occasional confusion by reporters and others, Taiwan has never used the Yale system. Instead, what many people mistakenly believe is Yale is instead MPS2.

I’m afraid, though, that I don’t have a definitive answer for how Taiwan ended up with the portmanteau spelling of Chunghwa. I suspect that what happened is that the initial intention was to go with the country’s official romanization system, which, way back when, was Gwoyeu Romatzyh (“GR” for short), even if you wouldn’t know that from signage or maps or just about anything but long-distance buses. But using GR would have yielded Jonghwa, which would likely struck people accustomed to seeing 中 romanized as chung as “looking weird” (even though chung is hardly an intuitive spelling for native speakers of English for what is zhong in Hanyu Pinyin). So they kept the chung but then went ahead with hwa, which is not so different than Wade-Giles’s hua. At least that’s my guess, based on having followed romanization in Taiwan for decades.

The odd choice of Chunghwa is not limited to just the postal system. The main telephone system uses it as well: Chunghwa Telecom.

logo for Chunghwa Telecom

If Taiwan ever gets a broader rectification of names under which the Republic of China (Zhōnghuá Mínguó) — not to be confused with the People’s Republic of China (Zhōnghuá Rénmín Gònghéguó) — is simply called “Taiwan,” that would likely remove the issue. The spelling of Taiwan is certainly standard and the same across most romanization systems – with the notable exception of Gwoyeu Romatzyh, which would give us Tairuan. (GR’s fuunny sperlinqs strike again!)

Pinyin, US trademark law, and myths about Chinese characters

芝麻 vs. ZHIMA

The Mandarin word for “sesame” is zhīma (written “芝麻” in Chinese characters). That’s all the Mandarin anyone will need to know for this post. But if any of you non-Mandarin speakers are curious, an approximate pronunciation would be the je in jerk + ma (with the a as in father).

OK, let’s get into it now.

Everyone knows open sesame from the story of Ali Baba and the forty thieves, thought Jack Ma, when he was deciding upon a name for his new company. Alibaba Group Holding Limited is now one of China’s and indeed one of the world’s largest companies. So it’s no surprise that “open sesame” and just plain ol’ “sesame” are still very much associated with the company. And yet the company was acting as if this were not so, at least when it comes to Pinyin.

The U.S. Patent and Trademark Office’s Trademark Trial and Appeal Board recently ruled finally against a trademark application by Advanced New Technologies Co. (hereafter “Applicant”), which was acting on behalf of Alibaba. The mark applied for was “ZHIMA” (as such). The application (serial no. 86832288) was originally filed on November 25, 2015; Applicant requested reconsideration after earlier rejections.

The trademark office has a longstanding rule that trademark applications must, “if the mark includes non-English wording,” include “an English translation of that wording.” But Alibaba didn’t want to do that. The U.S. trademark board ruling lists some of the claims put forth by those arguing for Alibaba.

Applicant refused to submit the required statement for the following reasons:

  1. There are no Chinese characters (or other non-Latin characters) in Applicant’s Mark;
  2. A purported meaning of Chinese characters (or any nonLatin characters of even designs or stylizations) cannot be attached to a mark that does not contain such characters);
  3. Even if similar lettering is used as a transliteration of Chinese characters, Applicant’s Mark, ZHIMA – the only wording at issue – is not a transliteration of Chinese characters;
  4. Applicant’s Mark ZHIMA is not a translation of Chinese characters;
  5. Applicant’s Mark does not mean “sesame” in English;
  6. There is no logical or acceptable reason to ascribe the meaning of any Chinese characters to Applicant’s Mark. Applicant’s Latin-character Mark is a coined word with no translation in a foreign language or meaning which can be attributed.

Applicant concludes that ZHIMA is a coined term, not a foreign word; therefore, a translation/transliteration statement is not necessary.

Although I’m not a lawyer, I do know a thing or two about Pinyin, Chinese characters, and the difference between languages (e.g., Mandarin, English, Swahili, Hebrew) and scripts (the means of writing those languages, e.g., Chinese characters, the Roman alphabet, the Hebrew alphabet). So I feel confident in stating that Alibaba’s claims were risible.

The ruling also quotes the Applicant as claiming that “it is the Chinese characters which translate to ‘sesame’ and that ‘zhima’ is merely a transliteration/pronunciation of these Chinese characters.”

The ruling sums that up as follows: “In other words, according to Applicant the Chinese characters 芝麻 pronounced ZHIMA mean ‘sesame,’ but ‘Zhima’ itself has no meaning.” Elsewhere in the ruling there is this:

Applicant argues, in essence, that while the Chinese characters pronounced ZHIMA means “sesame,” ZHIMA, in and of itself, has no meaning. This is because “the Latin characters ‘zhima’ or ‘zhi ma’ merely represent the transliteration/sounds of particular Chinese characters that are not part of the mark as filed” (i.e., ZHIMA). Without the Chinese characters, ZHIMA has no meaning.

I believe most people would have no trouble laughing at the claim that zhima (the way to write in Pinyin the Mandarin word for sesame) has “no meaning” but is merely something coined by the company. Would anyone believe that this was just some sort of coincidence?

The authorities at the Patent and Trademark Office of course had no trouble finding plenty of examples of zhima being used as such to write the Mandarin word for sesame, including by Alibaba itself. And so the application for a U.S. trademark on “ZHIMA” as a coined word that was supposedly not Mandarin at all but merely something without meaning was rejected once and for all. Importantly, this decision sets a precedent, which should help stop such claims in the future.

Although I’m pleased that the correct decision was reached, I don’t think the decision was necessarily a foregone conclusion, however obviously absurd the claims of Alibaba were. The problem is that a lot of people — including many who really should know better — actually believe nonsense like Chinese characters are necessary to convey the meaning of Mandarin words. The truth is that Mandarin is a language, and Chinese characters and Hanyu Pinyin are scripts (means for writing that language). Chinese characters are not some sort of über language. And, by extension, no matter how many times such claims are repeated, even in what would normally be considered reputable sources, there is no such thing as an “ideographic language” or a “logographic language.”

Speech is primary, not secondary, to the existence of a living language. If by some sort of quirk in the universe every single Chinese character vanished from the face of the Earth, Mandarin would still exist, hundreds of millions of people would still be speaking it with one another, and the Mandarin word for sesame would still be zhima, regardless of how one might write it or what the lawyers for a huge company claim.

Further reading: “Open Sesame” Without Translation Won’t Open Door to Trademark Registration, Lexicology, February 2, 2023

Atomic Enema Gwoyeu Romatzyh

box for a product with the English name of Atomic Enema

I know what you’re thinking: “Man, look at the weird romanization in that address!” ;-)

Say what you will against the Gwoyeu Romatzyh romanization system for Mandarin (or “GR” for short) — its quirkiness, its unnecessary complications, its counter-intuitiveness for those who don’t know its rules (much more so than with Hanyu Pinyin). But at least in the few instances where it’s still seen in the wild, it’s usually spelled correctly.

That’s not the case here.

The address for the manufacturer, the Health Chemical Pharmaceutical Co., Ltd., is given as

No.12, Yeou-4th Rd., Ta-Chia Yowshy Ind. Dist.
大甲工業區幼四路12號

  • yeou = Hanyu Pinyin yǒu — misspelled GR (should be “yow,” which is “yòu” in HP); this is all the more strange given that the company gets “yow” correct elsewhere in the same line
  • ta = HP dà — essentially correct Wade-Giles (not GR)
  • chia = HP jiǎ — essentially correct Wade-Giles (not GR)
  • yow = HP yòu — correct GR
  • shy = HP shī — misspelled GR (should be syh)

This is definitely misspelled Gwoyeu Romatzyh rather than a different system (such as MPS2, which is often seen in the boondocks of Taiwan).

And the city name is given as “Taichung,” which is bastardized Wade-Giles (for what would be spelled “Taizhong” in Hanyu Pinyin). But since that is the standard spelling in Taiwan, one can’t blame the company for this.

And at least the company didn’t get “4th” wrong, which is more than can be said for the Taichung City Government, as shown by a sign near the factory. (From Google Street View.)

The source of the other misspellings will likely remain enema-migmatic.

Street sign reading 'You 4rd Rd.'

Big Pinyin on Chengdu Storefronts

Fan Yiying and Gu Peng have posted a story at Sixth Tone that is both surprising and not surprising at all: State Media Criticizes Chengdu Shop Signs in Romanized Chinese.

The main points I’d like to make about this are:

  • Word-parsing matters.
  • Hundreds of millions of people in China use Hanyu Pinyin on a daily basis but still do not know how Pinyin is meant to work as an orthographic system.
  • The government of China, though it needs Pinyin, is in many ways hostile to it.
  • The fonts available for writing the Roman alphabet (and thus Pinyin) far exceed those for writing Chinese characters, so there is nothing in the least artistically limiting about Pinyin per se. (Whether Chinese characters are intrinsically more beautiful than the Roman alphabet is another matter.)

Here are some screenshots from the video mentioned in the article. Note: This isn’t the loveliest voice ever….

Sorry about the triangles on the photos, which make the shots look like videos. I wasn’t good at capturing screenshots without pausing the video, which made the triangles appear.

signs reading DIAN XIAN DIAN LAN, etc.

signs reading HONG DA TU WEN and MIAN DAO

signs reading HAO QI DENG SHI and ER LIANG WAN ZA MIAN

ER LIANG WAN ZA MIAN

ER LIANG WAN ZA MIAN sign in Chinese characters

Article on early Tongyong Pinyin on Taipei street signs

Reader Jens Finke recently came across a newspaper clipping from about twenty years ago, the dark ages of Taipei’s street signs. Back then most roads in the city were identified in bastardized Wade-Giles and wildly misspelled variations thereof. Two or even more spellings for one name at the same intersection was not uncommon. (Outside of Taipei, many signs were in MPS2, which is often mistaken — including in the article below — for the Yale system.) And so the foreign community of Taiwan by and large cried out for the use of Hanyu Pinyin. But that’s not what foreigners got. Instead, Taipei Mayor Chen Shui-bian decided to go with a half-baked local invention called Tongyong Pinyin.

Really, half-baked. Incredibly, not long after street signs started to go up in this system in 1998, its creator changed it. For example, the article mentions “Zhongsiao” (“Zhongxiao” in Hanyu Pinyin). Scarcely had the paint dried on the new street signs than the spelling in the supposedly same system was changed to “Jhongsiao.” This and other changes rendered most of the new signs obsolete.

But before many signs went up in the old new system or the new new system, Chen lost his December 1998 reelection bid. His successor, Ma Ying-jeou, didn’t pursue Tongyong Pinyin. Ma even took the surprising step of asking foreigners what they wanted and took action to implement the overwhelming choice of the foreign community (both then and now): Hanyu Pinyin, though unfortunately the road to this was not without monumentally foolish detours, bad ideas, and still-unfixed errors.

In 2000, Chen was elected president. He asked his minister of education, Ovid Tzeng, to decide on a romanization system for Taiwan. After Tzeng picked Hanyu Pinyin, he was given the boot. His successor saw the writing on the wall and quickly announced his support of Tongyong Pinyin. Meanwhile, Ma, who remained mayor of Taipei, said he had no plan to change to Tongyong Pinyin. This time marks the beginning of Taiwan’s romanization wars, which raged in the first decade of the century and have still not been completely resolved.

Some readers may suspect the reporter in the article below of pulling people’s legs (e.g., “Special thanks to janitorial assistant Shaw Toe-now of the Jyii Horng Bus Company in Tainan for faxing a copy of his employer’s self-designed romanization table”). But I assure you, it would be very difficult to outdo the craziness of Taiwan’s romanization situation back in those days.

Feel free to use the comments section below if you’d like to share any recollections of Taiwan’s signage mess of the 1990s and before.

In my transcription, I’ve fixed a few typos and omitted the article’s Cyrillic system for Mandarin.

photo of newspaper article on the enactment in Taipei of an early version of Tongyong Pinyin

Friday, May 8, 1998

It’s all Roman
By Ian Lamont
STAFF REPORTER

Throw out all of the new business cards, office stationery and checkbooks that you ordered a few months back to include Taipei’s new telephone numbers. Just three months after the phone company made all the city’s phone numbers eight digits long, the Taipei City Government has decided it wants to institute a new romanization system for street signs to make the city more accessible to international visitors.

Well, at least that’s the plan. Someone in the city government’s vast bureaucracy finally figured out that the screwed-up mix of Wade-Giles and Yale (the same guys who brought you “Peking”) was not really helping anything by having foreign nationals attempting to say “Jen-ai Road” or “Kien-kwo South Road” to bewildered taxi drivers.

Not that taxi drivers won’t be any less confused by the new linguistic concoctions that will result under the new system:

“I’d like to go to Her-ping West Road, please.”

“Huh?”

“You know, Her-ping West Road. It’s on the way to Manka?”

In case you didn’t understand this little exchange, “Herping” (rhymes with “burping”) is the new Mandarin romanization for the current Hoping East/West Road, while “Manka” is the Taiwanese name for Taipei’s Wanhua neighborhood. According to the Taipei City Government, both of these names will be in common use once all the city’s street signs are replaced.

Professor Yu Boh-chuan, the Academia Sinica linguist who helped design the new system, says his way reflects the local culture while at the same time following international standards.

Currently, there is only one international standard — the hanyu pinyin system developed by China some forty years ago and now almost universally accepted as the official Mandarin romanization system by governments, universities, libraries and publishers around the world. While there are many similarities between hanyu pinyin and Taipei’s new system, there are also several glaring differences, most notably the puzzling use of the letter “r” at the end of some syllables, the omission of the palatal spirant “sh” sound in certain Mandarin words, and the inclusion of Taiwanese, Hakkanese and Aborigine place names.

Since Taipei will soon have at least three different romanization systems floating around, Weekend has decided to create a handy chart that will help readers (and potentially psychotic mail sorters) survive the sticky transition period.

As an added bonus, we’ve decided to include several other alternative spelling systems for non-Chinese speakers. Special thanks to janitorial assistant Shaw Toe-now of the Jyii Horng Bus Company in Tainan for faxing a copy of his employer’s self-designed romanization table, as well as Prof. Vladimir Torostov of the Sinitic Languages Department of Khabarovsk University in Russia for submitting a conversion table with the cyrillic spellings for Taipei street names. Dosvidanya!

Old Romanization New Romanization Mainland Jyii Horng Bus Co.
Chunghsiao Zhongsiao Zhongxiao Chunggshaw
Jenai Renai Renai Lenie
Hsinyi/Shinyi Sinyi Xinyi Shynyii
Hoping Herping Heping Huhpeeng
Keelung Kelang Jilong Cheerlurng
Pateh Bader Bade Patiih

Reasons Gwoyeu Romatzyh never caught on, part 39

sign with a color photograph of a woman, with 'Eel Chyi 爾旗時尚' written beneath her

Eel Chyi

Here’s a sign spotted in Banqiao, Taiwan, for what would be written “Ěrqí” in Hanyu Pinyin.

“Ěrqí shíshàng” means “Erqi Fashion” (爾旗時尚), with the first word pronounced roughly like the English name “Archie.”

The doubled vowel (“ee”) is a marker of the Gwoyeu Romatzyh romanization system (or “GR” for short), in which doubled vowels indicate the third tone. Thus, “ee” in Gwoyeu Romatzyh equals “ě” in Hanyu Pinyin. As for the -l, that’s GR’s way of indicating -r. For those of you wondering why GR didn’t just use -r for -r, that’s because GR uses -r to indicate second tone … except when it uses other letters to do the same thing. It’s kinda complicated. For example:

  1. ēr = el
  2. ér = erl
  3. ěr = eel
  4. èr = ell

And

  1. qī = chi
  2. qí = chyi
  3. qǐ = chii
  4. qì = chih

Of course, Hanyu Pinyin’s q isn’t intuitive for most people used to reading in an alphabetic script but must be learned. Once learned, though, q is entirely consistent. And it must be noted that as quirky as Gwoyeu Romatyzh can be, its oddities are nothing compared to those of Chinese characters.