“No, no, no. It’s spelled ‘Raymond Luxury Yacht,’ but it’s pronounced ‘Throatwobbler Mangrove.’” — Monty Python’s Flying Circus

On February 17, Japan’s Legislative Council presented the country’s justice minister with an outline that would mandate that any kanji in names of newborns entered in official family registers include phonetic readings in kana. It would also restrict some readings.

Readings would also be added to names already in registers.

The changes would likely be enforced starting in the 2024 fiscal year (April 1, 2024, to March 31, 2025).

From a news article:

Currently, family registers do not have a field to indicate phonetic readings. After the law revision, family registers will include phonetic readings of kanji in kana characters.

According to the outlines, certain restrictions will be set on “colorful names” whose phonetic readings in kana characters deviate from the original meanings of the kanji characters.

Not only will newborn babies have pronunciations of their names entered in their family registers, but children and adults whose names already appear in family registers will be allowed to add phonetic readings.

Such people will be allowed to register different readings from ones already in their resident registers — records that are distinct from family registers — but the Justice Ministry calls for careful consideration when registering name readings in family registers.

The government plans to submit the revision bill during the current ordinary Diet session, aiming for it to be enforced in fiscal 2024.

The outlines say that “phonetic readings generally accepted as names” will be allowed in family registers.

A supplementary document to the outlines also calls for flexible management of the new system, given the historical and cultural reality that there have been some phonetic readings that are used only for names.

However, the government plans not to accept phonetic readings of names “that would confuse society.”

Examples of this restriction include readings with a meaning opposite to the kanji’s meaning, those that are difficult to distinguish from misreadings or misspellings, and those with no relation to the meaning of the kanji….

Discriminatory and obscene phonetic readings of names will not be accepted. Nor will names of characters from comics, anime and other fictitious works that would cause discomfort if used as the names of real people.

As current family registers do not have a section to indicate phonetic readings of names, people listed in Japanese family registers do not officially have phonetic reading of names under the Family Register Law.

In contrast, phonetic readings are written on resident registers. However, according to the ministry, those phonetics readings are not legally official but exist for administrative convenience. Currently, phonetic readings on birth registrations are used for resident registration purposes, but not for family registers.

After the law revision goes into force, kana characters for phonetic readings in birth registrations of newborns will also be used in family registers. Those who already have family registers can submit phonetic reading of their names to municipalities within one year after the revised law goes into force.

In particular, people with concerns such as the frequent mispronunciation of their names by others may find it necessary to have their resident registers revised to include the desired phonetic readings of their names. However, as changing the submitted names will require permission from a family court, the ministry urges careful consideration in deciding the name readings to be submitted.

For those who do not submit phonetic readings of names within one year after the enforcement of the revision, the official phonetic reading will be decided based on readings indicated in resident registers after municipal mayors send notifications to their respective residents.

I’m still wondering about the “cause discomfort” part. Discomfort to whom? How?

Japan to add romanization to names on My Number cards

The Japanese government has reportedly decided to add romanization for names on My Number cards, starting next year (2024). My Number cards — also known as Individual Number cards (or kojin bangō kādo / 個人番号カード) are a form of national ID.

Here’s basically what they look like now (without a space for romanization):
blank My Number card

But I haven’t been able to find any more specific information yet.

I wrote the authorities with My Number cards for clarification. I wanted to know what romanization system My Number Cards will use: Hepburn, Kunrei-shiki, or something else? Or will people be able to choose any system they want or to choose from a list of government-approved systems?

I also requested links to any articles/announcements about this in English or Japanese.

Unfortunately, the person who politely responded did not have any information about this beyond what I submitted.

Source: one small mention at the end of this article: Pronunciation of Japanese Personal Names to be Regulated by Planned Law Revision, Japan News (from the Yomiuri Shimbun), February 18, 2023.

More Americans studying in Japan

The number of U.S. students studying abroad in Japan is continuing to increase, having recovered from a sharp decline in the 2010–20111 school year.

This is in contrast to the situation in China, which has been seeing fewer and fewer U.S. students.

graph showing a steady increase in U.S. students studying in Japan from 2000, with a 33% decline in 2010, followed by a recovery that now surpasses the 2009 level.

I’m not sure what accounts for the sharp drop in 2010–2011. It occurred before the March 2011 earthquake and tsunami.

source: IEE Open Doors Study Abroad Destinations

Languages, scripts, and signs: a walk around Taipei’s Shixin University

Recently I took some trails through the mountains in Taipei and ended up at Shih Hsin University (Shìxīn Dàxué / 世新大學). Near the school are some interesting signs. Rather than giving individual posts for each of these, I’m keeping the signs together in this one, as this is better testimony to the increasing and often playful diversity of languages and scripts in Taiwan.

Cǎo Chuàn

Here’s a restaurant whose name is given in Pinyin with tone marks! That’s quite a rarity here, though I suspect we’ll be seeing more of this in the future. The name in Chinese characters (草串) can be found, much smaller, on a separate sign below.



Right by Cao Chuan is Èrgē de Niúròumiàn (Second Brother’s Beef Noodle Soup). Note the use of the Japanese の rather than Mandarin’s 的; this is quite common in Taiwan.



This store has an ㄟ, which serves as a marker of the Taiwanese language. Here, ㄟ is the equivalent of 的 — and of の.

Bālè ei diàn

A’Woo Tea Bar


I couldn’t find a name in Chinese characters for this place. The name is probably onomatopoeia, as in “Werewolves of London — awoo!”

UTF-8 Unicode vs. other encodings over time

Some eight years ago UTF-8 (Unicode) became the most used encoding on Web pages. At the time, though, it was used on only about 26% of Web pages, so it had a plurality but not an absolute majority.

Graph showing growth of the UTF-8 encoding

By the beginning of 2010 Unicode was rapidly approaching use on half of Web pages.
graph showing a steep rise in the use of UTF-8 and a steep decline in other major encodings

In 2012 the trends were holding up.

Note that the 2008 crossover point appears different in the latter two Google graphs, which is why I’m showing all three graphs rather than just the third.

A different source (with slightly different figures) provides us with a look at the situation up to the present, with UTF-8 now on 85% of Web pages. Expansion of UTF-8 is slowing somewhat. But that may be due largely to the continuing presence of older websites in non-Unicode encodings rather than lots of new sites going up in encodings other than UTF-8.
growth in Unicode UTF-8 encoding on Web pages, 2010-2015

Here’s the same chart, but focusing on encodings (other than UTF-8) that use Chinese characters, so the percentages are relatively low.

And here’s the same as the above, but with the results for individual languages combined.

By the way, Pinyin.info has been in UTF-8 since the site began way back in 2001.


AP language exams and Chinese in U.S. high schools

Today I’m continuing my look at the U.S. high school Advanced Placement foreign language exams, focusing especially on the AP exam in Chinese Language and Culture. (See also AP exams: using highest and lowest scores to look at the case of Chinese.)In the graphs below, “Chinese” is the first column on the left.

The first and obvious point from graphing the numbers of high school students from the class of 2015 who took an AP foreign language exam is the dominance of Spanish. Combined, the exams for Spanish Language and Spanish Literature outnumber all of the other language exams put together … times three.


Now let’s look at the figures above broken down into the grade during which people took the exam. As you can see, there’s something different about when people take the Chinese exam. For all other foreign languages, most people take the exam their senior year. But the Chinese Language and Culture exam is most often taken by juniors.


That’s a little lopsided. So let’s take Spanish and Spanish Lit. out of the mix so we can compare the other languages more easily.

In just a few years Chinese has grown to be the third-most popular AP foreign language exam, behind Spanish and French. OK: way, way behind Spanish and about half of the number that French has. And Chinese comes in fourth if you count Spanish Literature. Still, Chinese now has more test takers than German. And it has more than Latin, Italian, and Japanese put together. But — you knew there’d be a but — the numbers for the AP Chinese Language and Culture exam are relatively large because most of the people who take it already know the language and didn’t learn it in an AP class. That is reflected in the charts above showing when people took the exam. (Note that Spanish also has a relatively high number of juniors taking the exam.)

The closest measure we have for native speakers and others with a much higher level of exposure to the language in question than other students is what students indicate themselves to the College Board on their answer sheets. Here’s how the College Board defines a “standard” student: They “generally receive most of their foreign language training in U.S. schools. They did not indicate on their answer sheet that they regularly speak or hear the foreign language of the exam, or that they have lived for one month or more in a country where the language is spoken.”

Here are the numbers for “standard” students in 2015 across various languages.


In this, Chinese drops from third place to fifth, behind Spanish, French, Latin (which is without a question on the standard group), and German, but still ahead of Italian and Japanese. When all test-takers are considered, AP exams in French outnumber those in Chinese by a little less than 2:1, which sounds very impressive (and, to some degree, it is). But when only the standard groups are considered, AP exams in French outnumber those in Chinese by more than 7:1.

Later in this series, we’ll look further at both the standard group and those not in it.

US grad-level enrollments in Japanese continue long decline

Fewer and fewer people are taking graduate-level Japanese classes in U.S. universities, according to data recently released by the MLA.

Graduate-level enrollments in Japanese classes are at their lowest level since 1983 and have declined to less than half of their peak level, which was reached in 1995.

U.S. graduate-level enrollments in Japanese, 1986-2013, showing a peak of 1406 in 1995, a slight decline to 1356 in 1998, and a steeper decline since then, to just 567 in 2013

Here are a few more years. When looking at the earlier peaks, it’s worth remember that there are a lot more people in graduate school now than there were several decades ago, both in absolute terms and as a percentage of the population. So the recent figures are even more bleak than they might appear at first glance.

U.S. graduate-level enrollments in Japanese, 1960–2013

You might be wondering how Japanese stacks up against another Asian language. Here’s a comparison with graduate enrollments in Chinese (in blue). Again, the situation isn’t looking good for Japanese.

Graduate enrollments in Japanese vs. graduate enrollments in Chinese, 1986–2013

And here’s a look at the number of undergraduate enrollments in Japanese (green) and Chinese (blue) per enrollment in a graduate course in the same respective language.

Number of undergraduate enrollments in Japanese and Chinese per enrollment in a graduate course for the same language

Even so, boosters of Japanese may take heart that there are still more post-secondary enrollments in Japanese than in Mandarin. But more on that in a later post.

(For those of you who are wondering, no, this blog isn't really back just yet. But I think these numbers are interesting. Also, my MLA-related posts don't need Hanzi or Pinyin diacritics, which would only get messed up anyway. Thus, I might as well post the information for others to see.)

Google Translate and romaji revisited

OK, Google has improved its Pinyin converter some, though it still fails in important areas. So that’s the present situation for Google and Mandarin.

How about for Google and Japanese?

Professor J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures generously agreed to reexamine Google’s performance in conversions to rōmaji (Japanese written in romanization).

Below is his latest evaluation.

For his initial analysis (in December 2009), see Google Translate and rōmaji.

I ran the test passage through Google Translate again. There’s some improvement, but it’s still pretty mediocre.

Original Google Translate
6日午後4時35分ごろ、東京都千代田区皇居外苑の都道(内堀通り)の二重橋前交差点で、中国からの観光客の40代の男性が乗用車にはねられ、全身を強く打って間もなく死亡した。車は歩道に乗り上げて歩いていた男性(69)もはね、男性は頭を強く打って意識不明の重体。丸の内署は、運転していた東京都港区白金3丁目、会社役員高橋延拓容疑者(24)を自動車運転過失傷害の疑いで現行犯逮捕し、容疑を同致死に切り替えて調べている。 6-Nichi gogo 4-ji 35-fun-goro, Tōkyō-to Chiyoda-ku Kōkyogaien no todō (uchibori-dōri) no Nijūbashi zen kōsaten de, Chūgoku kara no kankō kyaku no 40-dai no dansei ga jōyōsha ni hane rare, zenshin o tsuyoku Utte mamonaku shibō shita. Kuruma wa hodō ni noriagete aruite ita dansei (69) mo hane, dansei wa atama o tsuyoku utte ishiki fumei no jūtai. Marunouchi-sho wa, unten shite ita Tōkyō-to Minato-ku hakkin 3-chōme, kaisha yakuin Takahashi nobe Tsubuse yōgi-sha (24) o jidōsha unten kashitsu shōgai no utagai de genkō-han taiho shi, yōgi o dō chishi ni kirikaete shirabete iru.
 同署によると、死亡した男性は横断歩道を歩いて渡っていたところを直進してきた車にはねられた。車は左に急ハンドルを切り、車道と歩道の境に置かれた仮設のさくをはね上げ、歩道に乗り上げたという。さくは歩道でランニングをしていた男性(34)に当たり、男性は両足に軽いけが。 Dōsho ni yoru to, shibō shita dansei wa ōdan hodō o aruite watatte ita tokoro o chokushin shite kita kuruma ni hane rareta. Kuruma wa hidari ni kyū handoru o kiri, shadō to hodō no sakai ni oka reta kasetsu no saku o haneage, hodō ni noriageta toyuu. Saku wa hodō de ran’ningu o shite ita dansei (34) niatari, dansei wa ryōashi ni karui kega.
 同署は、死亡した男性の身元確認を進めるとともに、当時の交差点の信号の状況を調べている。 Dōsho wa, shibō shita dansei no mimoto kakunin o susumeru totomoni, tōji no kōsaten no shingō no jōkyō o shirabete iru.
 現場周辺は東京観光のスポットの一つだが、最近はジョギングを楽しむ人も増えている。 Genba shūhen wa Tōkyō kankō no supotto no hitotsudaga, saikin wa jogingu o tanoshimu hito mo fuete iru.


  • The use of numerals dodges a plethora of errors, but “6-Nichi” is still wrong for Muika.
  • Lots of correct capitalizations have been added, but “uchibori” was missed and “Utte” capitalized by mistake.
  • Some false spaces or lack of spaces persist: “hane rare”, “oka reta”; “hitotsudaga” and “niatari” were correctly hitotsu da ga and ni atari in the original test.
  • Names still get butchered (“hakkin” for Shirogane, “nobe Tsubuse” for Nobuhiro.
  • The needless apostrophe in “ran’ningu” is still there.
  • Interestingly, “toyuu” is a new error: it should be to iu.
  • There’s evidence of some attempt to use hyphens, but why not in “kankō kyaku” or “Nijūbashi zen”?

So, to update: Google gets kudos for conscientiousness, but I stick by my original comments.

