Microsoft Translator and Pinyin

screenshot of the text described in the post, as treated by Microsoft Translator

If supplied with the following,

談中國的“語”和“文”的問題,我覺得最好能先了解一下在中國通用的語言。中國的主要語言有哪些?為甚麼我說這個,而不說那個?因為環境?因為被強迫?因為我愛這個語言?因為有必要?因為這個語言很重要?也想想什麼是中國人的共同語言。用一個共同語言有必要嗎?為什麼?別的漢語的去向會怎麼樣?如果你使用中國的共同語言普通話,你了解這個語言的語法(比如“的, 得, 地“ 和“了” 的不同用法)嗎? 知道這個語言的基本音節(不包括聲調)只有408個嗎?

Microsoft Translator produces the following Hanyu Pinyin:

tán zhōngguóde “yǔ” hé “wén”dewèntí, wǒjuéde zuìhǎo néng xiānliǎojiě yì xiàzài zhōngguó tōngyòng de yǔyán。 zhōngguóde zhǔyào yǔyán yǒu nǎxiē? wèishénme wǒshuō zhège ,érbùshuōnàgè? yīnwéi huánjìng? yīnwéi bèi qiǎngpò? yīnwéi wǒài zhège yǔyán? yīnwéi yǒubìyào? yīnwéi zhège yǔyán hěnzhòngyào? yě xiǎngxiǎng shénmeshì zhōngguórén de gòngtóngyǔyán。 yòng yígè gòngtóngyǔyán yǒubìyào ma? wèishénme? biéde hànyǔ de qùxiàng huì zěnmeyàng? rúguǒnǐ shǐyòng zhōngguóde gòngtóngyǔyán pǔtōnghuà , nǐ liǎojiě zhège yǔyán de yǔfǎ ( bǐrú “de,dé, de ”hé“le” de bùtóng yòngfǎ )ma? zhīdào zhège yǔyán de jīběn yīnjié (bùbāokuòshēngtiáo) zhǐyǒu 408gèma?

This has a number of obvious problems:

  • failure to capitalize the first letter in a sentence
  • failure to capitalize proper nouns (e.g., “zhongguo” should be “Zhongguo”) (Here is how to handle proper nouns in Pinyin.)
  • frequent appending of “de” to the word before it (Here is how to handle de in Pinyin.)
  • incorrect punctuation, e.g., commas, periods, parentheses, and question marks were not converted from their double-width (i.e., Chinese character) forms to regular roman forms (“,。?()” should appear instead as “,.?()”)
  • incorrect word parsing (sometimes)

In short: Thumbs-down for now. But it might not take too much work for Microsoft to make this significantly better.

2 thoughts on “Microsoft Translator and Pinyin

Leave a Reply

Your email address will not be published. Required fields are marked *