{"id":3019,"date":"2009-11-18T19:05:43","date_gmt":"2009-11-18T11:05:43","guid":{"rendered":"https:\/\/pinyin.info\/news\/?p=3019"},"modified":"2016-12-11T21:27:46","modified_gmt":"2016-12-11T13:27:46","slug":"google-translates-new-pinyin-function-sucks","status":"publish","type":"post","link":"https:\/\/pinyin.info\/news\/2009\/google-translates-new-pinyin-function-sucks\/","title":{"rendered":"Google Translate&#8217;s new Pinyin function sucks"},"content":{"rendered":"<p><a href=\"http:\/\/translate.google.com\/\">Google Translate<\/a> has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it&#8217;s <em>terrible<\/em>, all things considered. <\/p>\n<p>What Google has created is about at the same level as scripts hobbyists cobbled together the hard way about a decade ago from early versions of CE-DICT. Don&#8217;t get me wrong: I greatly admire what sites such as Ocrat achieved way back when. But for Google &#8212; with all of its data, talent, and money &#8212; to do essentially no better so many years later is nothing short of a disgrace.<\/p>\n<p>To see Google Translate&#8217;s Pinyin function in action you must select &#8220;Chinese (Simplified)&#8221; or &#8220;Chinese (Traditional)&#8221; &#8212; <em>not English<\/em> &#8212; for the &#8220;Translate into&#8221; option. And then click on &#8220;Show romanization&#8221;. <\/p>\n<p>For example, here&#8217;s what happens with the following text from an <a href=\"https:\/\/pinyin.info\/chinese_characters\/simplified_traditional\/zhang_liqing\/english.html\">essay on simplified and traditional Chinese characters<\/a> by Zhang Liqing:<\/p>\n<blockquote><p>&#35527;&#20013;&#22283;&#30340;&#8220;&#35486;&#8221;&#21644;&#8220;&#25991;&#8221;&#30340;&#21839;&#38988;&#65292;&#25105;&#35258;&#24471;&#26368;&#22909;&#33021;&#20808;&#20102;&#35299;&#19968;&#19979;&#22312;&#20013;&#22283;&#36890;&#29992;&#30340;&#35486;&#35328;&#12290;&#20013;&#22283;&#30340;&#20027;&#35201;&#35486;&#35328;&#26377;&#21738;&#20123;&#65311;&#28858;&#29978;&#40636;&#25105;&#35498;&#36889;&#20491;&#65292;&#32780;&#19981;&#35498;&#37027;&#20491;&#65311;&#22240;&#28858;&#29872;&#22659;&#65311;&#22240;&#28858;&#34987;&#24375;&#36843;&#65311;&#22240;&#28858;&#25105;&#24859;&#36889;&#20491;&#35486;&#35328;&#65311;&#22240;&#28858;&#26377;&#24517;&#35201;&#65311;&#22240;&#28858;&#36889;&#20491;&#35486;&#35328;&#24456;&#37325;&#35201;&#65311;&#20063;&#24819;&#24819;&#20160;&#40636;&#26159;&#20013;&#22283;&#20154;&#30340;&#20849;&#21516;&#35486;&#35328;&#12290;&#29992;&#19968;&#20491;&#20849;&#21516;&#35486;&#35328;&#26377;&#24517;&#35201;&#21966;&#65311;&#28858;&#20160;&#40636;&#65311;&#21029;&#30340;&#28450;&#35486;&#30340;&#21435;&#21521;&#26371;&#24590;&#40636;&#27171;&#65311;&#22914;&#26524;&#20320;&#20351;&#29992;&#20013;&#22283;&#30340;&#20849;&#21516;&#35486;&#35328;&#26222;&#36890;&#35441;&#65292;&#20320;&#20102;&#35299;&#36889;&#20491;&#35486;&#35328;&#30340;&#35486;&#27861;&#65288;&#27604;&#22914;&#8220;&#30340;&#65292; &#24471;&#65292; &#22320;&#8220; &#21644;&#8220;&#20102;&#8221; &#30340;&#19981;&#21516;&#29992;&#27861;&#65289;&#21966;&#65311; &#30693;&#36947;&#36889;&#20491;&#35486;&#35328;&#30340;&#22522;&#26412;&#38899;&#31680;&#65288;&#19981;&#21253;&#25324;&#32882;&#35519;&#65289;&#21482;&#26377;408&#20491;&#21966;&#65311;<\/p><\/blockquote>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pinyin.info\/news\/news_photos\/2009\/11\/google_translate_pinyin1.gif\" alt=\"screenshot of Google Translate with the text above\" title=\"google_translate_pinyin1\" width=\"510\" height=\"309\" \/><\/p>\n<p>Google Translate will produce this:<br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pinyin.info\/news\/news_photos\/2009\/11\/google_translate_pinyin2.gif\" alt=\"screenshot of Google Translate with the text above and how Google Translate puts this into Pinyin (see text below)\" width=\"510\" height=\"435\"  \/><\/p>\n<blockquote><p><span class=\"py\">t&#225;n zh&#333;ng gu&#243; de&#8220;y&#468;&#8220;h&#233;&#8221; w&#233;n&#8221; de w&#232;n t&#237;&#65292; w&#466; ju&#233; de zu&#236; h&#462;o n&#233;ng xi&#257;n li&#462;o ji&#232; y&#299; xi&#224; z&#224;i zh&#333;ng gu&#243; t&#333;ng y&#242;ng de y&#468; y&#225;n&#12290;zh&#333;ng gu&#243; de zh&#468; y&#224;o y&#468; y&#225;n y&#466;u n&#462; xi&#275;&#65311;w&#233;i sh&#232;n me w&#466; shu&#333; zh&#232; ge&#65292; &#233;r b&#249; shu&#333; n&#224; g&#232;&#65311;y&#299;n w&#232;i hu&#225;n j&#236;ng&#65311;y&#299;n w&#232;i b&#232;i qi&#462;ng p&#242;&#65311;y&#299;n w&#232;i w&#466; &#224;i zh&#232; ge y&#468; y&#225;n&#65311;y&#299;n w&#232;i y&#466;u b&#236; y&#224;o&#65311;y&#299;n w&#232;i zh&#232; ge y&#468; y&#225;n h&#283;n zh&#242;ng y&#224;o&#65311;y&#283; xi&#462;ng xi&#462;ng sh&#233;n me sh&#236; zh&#333;ng gu&#243; r&#233;n de g&#242;ng t&#243;ng y&#468; y&#225;n&#12290;y&#242;ng y&#299; g&#232; g&#242;ng t&#243;ng y&#468; y&#225;n y&#466;u b&#236; y&#224;o ma&#65311;w&#232;i sh&#233; me&#65311;bi&#233; de h&#224;n y&#468; de q&#249; xi&#224;ng hu&#236; z&#283;n me y&#224;ng&#65311;r&#250; gu&#466; n&#464; sh&#464; y&#242;ng zh&#333;ng gu&#243; de g&#242;ng t&#243;ng y&#468; y&#225;n p&#468; t&#333;ng hu&#224;&#65292; n&#464; li&#462;o ji&#283; zh&#232; ge y&#468; y&#225;n de y&#468; f&#462;&#65288;b&#464; r&#250;&#8220;de&#65292; de&#65292; de&#8220; h&#233;&#8220;le&#8221; de b&#249; t&#243;ng y&#242;ng f&#462;&#65289; ma&#65311;zh&#299; d&#224;o zh&#232; ge y&#468; y&#225;n de j&#299; b&#283;n y&#299;n ji&#233;&#65288;b&#249; b&#257;o ku&#242; sh&#275;ng di&#224;o&#65289; zh&#464; y&#466;u408g&#232; ma&#65311;<\/span><\/p><\/blockquote>\n<p>Here&#8217;s what&#8217;s wrong: <\/p>\n<ul>\n<li>This is all <em>bro ken syl la bles<\/em> instead of word parsing. (So it&#8217;s never even a question if they get the use of the <a href=\"https:\/\/pinyin.info\/romanization\/hanyu\/apostrophes.html\">apostrophe<\/a> correct.)<\/li>\n<li>Proper nouns are not capitalized (e.g., <span class=\"py\">zh&#333;ng gu\u00f3<\/span> vs. <span class=\"py\">Zh&#333;nggu\u00f3<\/span>).<\/li>\n<li>The first letter in each sentence is not capitalized.<\/li>\n<li>Punctuation is not converted but remains in double-width Chinese style, which is wrong for Pinyin.<\/li>\n<li>Spacing around most punctuation is also incorrect (e.g., although a space is added after a comma and a closing parenthesis, there&#8217;s no space after a period or a question mark. See also the spacing or lack thereof around quotation marks, numerals, etc.)<\/li>\n<li>Because of lack of word parsing, some given pronunciations are wrong. <\/li>\n<\/ul>\n<p>In my previous post I complained about Google Maps&#8217; unfortunately botched switch to Hanyu Pinyin. I stated there that, unlike Google Maps, Google Translate would correctly produce &#8220;Chengdu&#8221; from &#8220;&#25104;&#37117;&#8221; (which it does when &#8220;translate into&#8221; is set for English). But I see that the romanization bug feature of Google Translate also fails this simple test. It generates the incorrect &#8220;ch&#233;ng d&#333;u&#8221;.<\/p>\n<p>All of this indicates that Google apparently is using a poor database and not only has no idea of <a href=\"https:\/\/pinyin.info\/news\/category\/writing-systems\/pinyin-rules\/\">how Pinyin is meant to be written<\/a> but also lacks an understanding of even <a href=\"https:\/\/pinyin.info\/readings\/zyg\/rules.html\">the basic rules of Pinyin<\/a>. <\/p>\n<p>If you should need to use a free Web-based Pinyin converter, avoid Google Translate. Instead use <a href=\"http:\/\/www.popupchinese.com\/tools\/adso\">Adso<\/a> (from the fine folk at <a href=\"http:\/\/www.popupchinese.com\/\">Popup Chinese<\/a>) or perhaps <a href=\"http:\/\/www.nciku.com\/\">NCIKU<\/a> or <a href=\"http:\/\/usa.mdbg.net\/chindict\/chindict.php\">MDBG<\/a> &#8212; all of which, despite their limitations (c&#8217;mon, guys, sentences begin with capital letters), are significantly better than what Google offers. <\/p>\n<p>By the way, Google Translate will also romanize Japanese texts written in kanji and kana, Russian texts written in Cyrillic, etc. But I&#8217;ll leave those to others to analyze. <\/p>\n<p>For lagniappe, here&#8217;s a real Hanyu Pinyin version of the text above: <\/p>\n<blockquote><p><span class=\"py\">T&#225;n Zh&#333;nggu&#243; de &#8220;y&#468;&#8221; h&#233; &#8220;w&#233;n&#8221; de w&#232;nt&#237;, w&#466; ju&#233;de zu&#236;h&#462;o n&#233;ng xi&#257;n li&#462;oji&#283; y&#299;xi&#224; z&#224;i Zh&#333;nggu&#243; t&#333;ngy&#242;ng de y&#468;y&#225;n. Zh&#333;nggu&#243; de zh&#468;y&#224;o y&#468;y&#225;n y&#466;u n&#462;xi&#275;? W&#232;ish&#233;nme w&#466; shu&#333; zh&#232;ge, &#233;r b&#249; shu&#333; n&#224;ge? Y&#299;nwei hu&#225;nj&#236;ng? Y&#299;nwei b&#232;i qi&#462;ngp&#242;? Y&#299;nwei w&#466; &#224;i zh&#232;ge y&#468;y&#225;n? Y&#299;nwei y&#466;u b&#236;y&#224;o? Y&#299;nwei zh&#232; ge y&#468;y&#225;n h&#283;n zh&#242;ngy&#224;o? Y&#283; xi&#462;ngxiang sh&#233;nme sh&#236; Zh&#333;nggu&#243;r&#233;n de g&#242;ngt&#243;ng y&#468;y&#225;n? Y&#242;ng y&#299;ge g&#242;ngt&#243;ng y&#468;y&#225;n y&#466;u b&#236;y&#224;o ma? Weishenme? Bi&#233;de H&#224;ny&#468; de q&#249;xi&#224;ng hu&#236; z&#283;nmey&#224;ng? R&#250;gu&#466; n&#464; sh&#464;y&#242;ng Zh&#333;nggu&#243; de g&#242;ng t&#243;ng y&#468;y&#225;n P&#468;t&#243;nghu&#224;, n&#464; li&#462;oji&#283; zh&#232;ge y&#468;y&#225;n de y&#468;f&#462; (b&#464;r&#250; &#8220;de&#8221; h&#233; &#8220;le&#8221; de b&#249;t&#243;ng y&#466;ngf&#462;) ma? Zh&#299;dao zh&#232;ge y&#468;y&#225;n de j&#299;b&#283;n y&#299;nji&#233; (b&#249; b&#224;oku&#242; sh&#275;ngdi&#224;o) zh&#464; y&#466;u 408 ge ma?<br \/>\n<\/span><\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Google Translate has a new function: conversion to Hanyu Pinyin, which would be exciting and wonderful if it were any good. But unfortunately it&#8217;s terrible, all things considered. What Google has created is about at the same level as scripts &hellip; <a href=\"https:\/\/pinyin.info\/news\/2009\/google-translates-new-pinyin-function-sucks\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,15,113,106,29,28,32,20,126,19,38,31],"tags":[669,668,670],"class_list":["post-3019","post","type-post","status-publish","format-standard","hentry","category-chinese","category-chinese-characters","category-computers","category-hanyu","category-japanese","category-languages","category-mandarin","category-pinyin","category-romaji","category-romanization","category-software","category-writing-systems","tag-google","tag-google-maps","tag-google-translate"],"_links":{"self":[{"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/posts\/3019","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/comments?post=3019"}],"version-history":[{"count":34,"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/posts\/3019\/revisions"}],"predecessor-version":[{"id":7354,"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/posts\/3019\/revisions\/7354"}],"wp:attachment":[{"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/media?parent=3019"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/categories?post=3019"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pinyin.info\/news\/wp-json\/wp\/v2\/tags?post=3019"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}