The following is a guest post by Victor H. Mair.


How do we learn languages, after all? By following rules, whether hard-wired or learned? Or by acquiring and absorbing principles and patterns through massive amounts of repetitions?

AI is changing scientists’ understanding of language learning — and raising questions about innate grammar,” a stimulating new article by Morten Christiansen and Pablo Contreras Kallens that first appeared in The Conversation (10/19/2022) and later in Ars Technica and elsewhere, begins thus:

Unlike the carefully scripted dialogue found in most books and movies, the language of everyday interaction tends to be messy and incomplete, full of false starts, interruptions and people talking over each other. From casual conversations between friends, to bickering between siblings, to formal discussions in a boardroom, authentic conversation is chaotic. It seems miraculous that anyone can learn language at all given the haphazard nature of the linguistic experience.

I must say that I am in profound agreement with this scenario. In many university and college departments, which consist entirely of learned professors, you’d think that discussions and deliberations would be governed by regulations and rationality. Such, however, is not the case. Instead, people constantly talk over and past each other, barely listening to what their colleagues are saying. They interrupt one another and engage in aggressive behavior, or erupt in mindless laughter over who knows what. I’m not saying that all the members of these departments are like this nor that all departments are like this, but far too many do converse in this fashion. The individuals who are more sedate and civilized tend to remain silent for hours on end because, as the saying goes, they can’t get a word in edgewise. It’s a wonder that departments can accomplish anything.

For this reason, many language scientists – including Noam Chomsky, a founder of modern linguistics – believe that language learners require a kind of glue to rein in the unruly nature of everyday language. And that glue is grammar: a system of rules for generating grammatical sentences.

Everybody knows these things — or knew them decades ago — but now they are indubitably passé.

Children must have a grammar template wired into their brains to help them overcome the limitations of their language experience – or so the thinking goes.

This template, for example, might contain a “super-rule” that dictates how new pieces are added to existing phrases. Children then only need to learn whether their native language is one, like English, where the verb goes before the object (as in “I eat sushi”), or one like Japanese, where the verb goes after the object (in Japanese, the same sentence is structured as “I sushi eat”).

But new insights into language learning are coming from an unlikely source: artificial intelligence. A new breed of large AI language models can write newspaper articles, poetry and computer code and answer questions truthfully after being exposed to vast amounts of language input. And even more astonishingly, they all do it without the help of grammar.

Now, however, the authors make an astonishing claim. They assert that AI language models produce language that is grammatically correct, but they do so without a grammar!

Even if their choice of words is sometimes strange, nonsensical or contains racist, sexist and other harmful biases, one thing is very clear: the overwhelming majority of the output of these AI language models is grammatically correct. And yet, there are no grammar templates or rules hardwired into them – they rely on linguistic experience alone, messy as it may be.

GPT-3, arguably the most well-known of these models, is a gigantic deep-learning neural network with 175 billion parameters. It was trained to predict the next word in a sentence given what came before across hundreds of billions of words from the internet, books and Wikipedia. When it made a wrong prediction, its parameters were adjusted using an automatic learning algorithm.

Remarkably, GPT-3 can generate believable text reacting to prompts such as “A summary of the last ‘Fast and Furious’ movie is…” or “Write a poem in the style of Emily Dickinson.” Moreover, GPT-3 can respond to SAT level analogies, reading comprehension questions and even solve simple arithmetic problems – all from learning how to predict the next word.

The authors delve more deeply into comparisons of AI models and human brains, not without raising some significant problems:

A possible concern is that these new AI language models are fed a lot of input: GPT-3 was trained on linguistic experience equivalent to 20,000 human years. But a preliminary study that has not yet been peer-reviewed found that GPT-2 [a “little brother” of GPT-3] can still model human next-word predictions and brain activations even when trained on just 100 million words. That’s well within the amount of linguistic input that an average child might hear during the first 10 years of life.

In conclusion, Christiansen and Kallens call for a rethinking of language learning:

“Children should be seen, not heard” goes the old saying, but the latest AI language models suggest that nothing could be further from the truth. Instead, children need to be engaged in the back-and-forth of conversation as much as possible to help them develop their language skills. Linguistic experience – not grammar – is key to becoming a competent language user.

By all means, talk at the table, but respectfully, and not too loudly.

Selected readings

Atomic Enema Gwoyeu Romatzyh

box for a product with the English name of Atomic Enema

I know what you’re thinking: “Man, look at the weird romanization in that address!” ;-)

Say what you will against the Gwoyeu Romatzyh romanization system for Mandarin (or “GR” for short) — its quirkiness, its unnecessary complications, its counter-intuitiveness for those who don’t know its rules (much more so than with Hanyu Pinyin). But at least in the few instances where it’s still seen in the wild, it’s usually spelled correctly.

That’s not the case here.

The address for the manufacturer, the Health Chemical Pharmaceutical Co., Ltd., is given as

No.12, Yeou-4th Rd., Ta-Chia Yowshy Ind. Dist.

  • yeou = Hanyu Pinyin yǒu — misspelled GR (should be “yow,” which is “yòu” in HP); this is all the more strange given that the company gets “yow” correct elsewhere in the same line
  • ta = HP dà — essentially correct Wade-Giles (not GR)
  • chia = HP jiǎ — essentially correct Wade-Giles (not GR)
  • yow = HP yòu — correct GR
  • shy = HP shī — misspelled GR (should be syh)

This is definitely misspelled Gwoyeu Romatzyh rather than a different system (such as MPS2, which is often seen in the boondocks of Taiwan).

And the city name is given as “Taichung,” which is bastardized Wade-Giles (for what would be spelled “Taizhong” in Hanyu Pinyin). But since that is the standard spelling in Taiwan, one can’t blame the company for this.

And at least the company didn’t get “4th” wrong, which is more than can be said for the Taichung City Government, as shown by a sign near the factory. (From Google Street View.)

The source of the other misspellings will likely remain enema-migmatic.

Street sign reading 'You 4rd Rd.'

Big Pinyin on Chengdu Storefronts

Fan Yiying and Gu Peng have posted a story at Sixth Tone that is both surprising and not surprising at all: State Media Criticizes Chengdu Shop Signs in Romanized Chinese.

The main points I’d like to make about this are:

  • Word-parsing matters.
  • Hundreds of millions of people in China use Hanyu Pinyin on a daily basis but still do not know how Pinyin is meant to work as an orthographic system.
  • The government of China, though it needs Pinyin, is in many ways hostile to it.
  • The fonts available for writing the Roman alphabet (and thus Pinyin) far exceed those for writing Chinese characters, so there is nothing in the least artistically limiting about Pinyin per se. (Whether Chinese characters are intrinsically more beautiful than the Roman alphabet is another matter.)

Here are some screenshots from the video mentioned in the article. Note: This isn’t the loveliest voice ever….

Sorry about the triangles on the photos, which make the shots look like videos. I wasn’t good at capturing screenshots without pausing the video, which made the triangles appear.

signs reading DIAN XIAN DIAN LAN, etc.

signs reading HONG DA TU WEN and MIAN DAO



ER LIANG WAN ZA MIAN sign in Chinese characters

Who you callin’ “grandma”?!

Late last year a police officer in Taichung (Taizhong), Taiwan, was checking on a fifty-something-year-old woman when he made the mistake of addressing her as “ama” (Taiwanese for “grandmother,” and generally preferred here to Mandarin forms for elderly women).

Addressing a fifty-something Taiwanese woman even as “ayi” (auntie) would be inadvisable, assuming, of course, she’s not your actual aunt. But “ama”?

I pity the fool.

In response to complaints, the police have come up with guidelines for how to address members of the public, and most terms are now discouraged.

Tǒngyī lǜ dìng 4 zhǒng chēnghu, rúguǒ shì niánqīng rén, kàn shì xuéshēng, bù fēn nánnǚ, tǒngyī chēnghu “tóngxué,” rúguǒ shì niánqīng nǚxìng, tǒngyī chēnghu “xiǎojiě,” zīshēn (niánzhǎng) nǚxìng zé shì tǒngyī chēnghu “nǚshì,” zhìyú nánxìng, chúle niánqīng xuéshēng zhī wài, dōu chēnghu “xiānshēng.”


So there are now four categories:

  • young people (regardless of gender) who look like students: tóngxué (a term used to refer to students or one’s classmates)
  • young women: xiǎojiě (miss, Ms.)
  • older women: nǚshì (this one’s tricky; it’s more formal than “ma’am”; more like “madame,” I suppose).
  • men who look older than students: xiānshēng (mister, sir)

As I remarked above, “nǚshì” is a bit tricky, but not just in terms of translation. It’s quite formal and something people usually would write rather than say. Consider, for example, how one might begin a letter to a stranger “Dear [name]”; but if you were standing in front of that person you would not begin a conversation with them with the same words.

So, if in doubt, call a Taiwanese woman “xiǎojiě.” But calling a Chinese woman “xiaojie” is not a good idea these days (if not used in combination with a surname), though it was fine when I lived in China back in the early 1990s.

By the way, if you ever need to see if a font face will handle Hanyu Pinyin with tone marks well, “nǚshì” is an excellent test word, as “ǚ” is the combination of letter and tone least likely to be supported.

Further reading:

Year of the Tiger puns, part 1

This is a cute ad for a bakery in Banqiao, Taiwan. The text in Chinese characters reads “虎年送吼禮” (Hǔnián sòng hǒu lǐ).

What’s odd about this is the character 吼, which is the character used to write the Mandarin word “hǒu” (howl, roar). So the text in English reads something like “[In the] Year of the Tiger, give roar gifts.”

This only makes proper sense when one knows that here “hǒu” is standing in for the Taiwanese word for “good” (in Mandarin: hǎo/好).

image with two cute cartoon tigers, one of which is baying. The speech bubble for that is the Chinese character 吼

Cantonese version of Wordle

Wordle is an interesting word game that has gone viral, so you’ve probably already heard of it. I discovered it a few weeks ago through an article in the New York Times.

Now there’s a Cantonese version in romanization: Zidou.

screenshot of Zidou

Is there already a Mandarin version in Hanyu Pinyin or any other romanization system? If not, or if the only existing ones are based on syllables and not words, then there certainly should be

Further reading: Wordle: As word puzzle takes over the internet, Hong Kong professor creates Cantonese version. Hong Kong Free Press, January 29, 2022.

Old Taipei street sign

Pinyin News reader Channing Bartlett passed along this photo he took c. 1980 in Taipei at the corner of Jianguo North Road section 1 and Chang’an East Road section 2. As you can see, inconsistencies on Taiwan street signs weren’t restricted to matters of romanization. Here we have 建國北路一段 (Jiànguó Běi Lù yī duàn) and 長安東路二段 (Cháng’ān Dōng Lù èr duàn) — or rather “段二路東安長.”

One sign is written left to right, the other right to left.

Also, if you look closely at the characters for lu and duan, you can see that the fonts are different, likely indicating the signs are of different ages. But if one sign was replaced, why not the other? Mysterious are the ways of Taiwan street signs.

Bartlett described the experience of trying to read street signs quickly back then:

As I was on a bus barreling by, I had just a quick moment to read one. But often it took up my quick moment just to see whether it was written L to R or vice versa. The practice was inconsistent, as you can see in this photo.

Follow me

I ran into a reader of the other day, which has had me feeling guilty for not posting anything in recent months. So here’s something I wrote nearly a year ago but never posted. The sign is now long gone, but the linguistic points remain the same.

Near the Banqiao train station is this sign, which advertises small apartments. (At just 13 or 14 ping, counting the shares of all of the “public” spaces, they are basically tiny.) It has a lot of points of note for so little text:

  • Chinese characters are used to write an English word: 發樓 (fālóu) = follow.
  • English (“Follow me”) is used as well as Mandarin.
  • Numbers are used to write a Mandarin word: 94, i.e., jiǔ sì (九四) = jiùshì (就是). Note also that this works despite the tones being different.

發樓ME (with the English “Follow me” there for clarity as well)
告別租隊友 live your life

image of the large billboard discussed in this post