Google Translate and romaji revisited

OK, Google has improved its Pinyin converter some, though it still fails in important areas. So that’s the present situation for Google and Mandarin.

How about for Google and Japanese?

Professor J. Marshall Unger of the Ohio State University’s Department of East Asian Languages and Literatures generously agreed to reexamine Google’s performance in conversions to r?maji (Japanese written in romanization).

Below is his latest evaluation.

For his initial analysis (in December 2009), see Google Translate and r?maji.

I ran the test passage through Google Translate again. There’s some improvement, but it’s still pretty mediocre.

Original Google Translate
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? 6-Nichi gogo 4-ji 35-fun-goro, T?ky?-to Chiyoda-ku K?kyogaien no tod? (uchibori-d?ri) no Nij?bashi zen k?saten de, Ch?goku kara no kank? kyaku no 40-dai no dansei ga j?y?sha ni hane rare, zenshin o tsuyoku Utte mamonaku shib? shita. Kuruma wa hod? ni noriagete aruite ita dansei (69) mo hane, dansei wa atama o tsuyoku utte ishiki fumei no j?tai. Marunouchi-sho wa, unten shite ita T?ky?-to Minato-ku hakkin 3-ch?me, kaisha yakuin Takahashi nobe Tsubuse y?gi-sha (24) o jid?sha unten kashitsu sh?gai no utagai de genk?-han taiho shi, y?gi o d? chishi ni kirikaete shirabete iru.
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? D?sho ni yoru to, shib? shita dansei wa ?dan hod? o aruite watatte ita tokoro o chokushin shite kita kuruma ni hane rareta. Kuruma wa hidari ni ky? handoru o kiri, shad? to hod? no sakai ni oka reta kasetsu no saku o haneage, hod? ni noriageta toyuu. Saku wa hod? de ran’ningu o shite ita dansei (34) niatari, dansei wa ry?ashi ni karui kega.
???????????????????????????????????????????? D?sho wa, shib? shita dansei no mimoto kakunin o susumeru totomoni, t?ji no k?saten no shing? no j?ky? o shirabete iru.
????????????????????????????????????????? Genba sh?hen wa T?ky? kank? no supotto no hitotsudaga, saikin wa jogingu o tanoshimu hito mo fuete iru.


  • The use of numerals dodges a plethora of errors, but “6-Nichi” is still wrong for Muika.
  • Lots of correct capitalizations have been added, but “uchibori” was missed and “Utte” capitalized by mistake.
  • Some false spaces or lack of spaces persist: “hane rare”, “oka reta”; “hitotsudaga” and “niatari” were correctly hitotsu da ga and ni atari in the original test.
  • Names still get butchered (“hakkin” for Shirogane, “nobe Tsubuse” for Nobuhiro.
  • The needless apostrophe in “ran’ningu” is still there.
  • Interestingly, “toyuu” is a new error: it should be to iu.
  • There’s evidence of some attempt to use hyphens, but why not in “kank? kyaku” or “Nij?bashi zen”?

So, to update: Google gets kudos for conscientiousness, but I stick by my original comments.

For more by Prof. Unger, see’s recommended readings, which includes selections from The Fifth Generation Fallacy: Why Japan Is Betting Its Future on Artificial Intelligence, Literacy and Script Reform in Occupation Japan: Reading Between the Lines, and Ideogram: Chinese Characters and the Myth of Disembodied Meaning.

2 thoughts on “Google Translate and romaji revisited

  1. In some romanizations, ?? will be written yuu, because it is pronounced like ??? This does not make “toyuu” correct, but “to yuu” could be right. (? will be romanized as wa, so why should ?? not be romanized as yuu?)

  2. this grad student has his own website here, he developed a pinyin to xiaoerjing converter

    the arabic xiaoerjin is adequate enought to represent chinese.

    he does not harp on about anything and does not show any bias when he writes about his linguistic work on his blog, not trying to promote a certain POV, much more professional than this pinyininfo site.

Leave a Reply

Your email address will not be published. Required fields are marked *