How to find Windows files that contain Chinese characters

Someone just wrote me to ask “Supposing I want to search for a Chinese name or word string across a whole DIRECTORY folder such as comes up in a windows directory search (the folder icon)?”

If you know the characters in question, the search is of course easy. Simply click in the Microsoft Windows File Explorer search box (marked in red in the image below), type in your phrase, and hit ENTER.

But what if you don’t know the phrase in question or you simply want to find all files containing Chinese characters? Normally one would turn to wildcard searches. But Windows File Explorer’s wildcard support is extremely limited, so the trick for finding Chinese characters (Hanzi) in a Microsoft Word document doesn’t work here.

I recommend running a search for an extremely common Chinese character. The most commonly used Hanzi is the one for the possessive particle de:

This won’t necessarily find every file with Chinese characters — just as searching files for the letter e won’t necessarily find every document that contains some English; but it’s the best I could think of on short notice.

I created some descriptively titled test documents and put them in a folder together:

  1. This file contains the Hanzi de but not in the title
  2. This file has many Hanzi but not the character for de
  3. This file has no Hanzi except 的 in the file name
  4. This file has no Hanzi in either the file or the file name

Then I ran a search for . The results show that Windows File Explorer uncovered the files containing 的 within the contents of the file and/or in the file name (i.e., files no. 3 and 1).

screenshot revealing the search results

Using Windows File Explorer’s search tools to refine the criteria should help speed up searches.

An alternate to de would be the character for :

Does anyone have better or alternate approaches to recommend?

Bing Maps for Taiwan

The maps of Taiwan put out by GooGle are plagued with errors in their use of Pinyin. But what about that other big company with deep pockets? You know: Microsoft. How good a job does Microsoft’s Bing do with its maps of Taiwan?
map of Taiwan from Bing, showing Wade-Giles place names

I won’t keep y’all waiting: After examining Bing’s maps of Taiwan the two words that came first to mind were incompetent and atrocious.

The country-level map is odd, offering Wade-Giles. And although the use of the hyphen is irregular, I will give Bing points for getting at least Wade-Giles’ apostrophes right. So, although some place names on the map are decades out of date (e.g., Hsin-chuang, Chungli, Chunan, Kuang-fu), at least they’re not horribly misspelled within that system.

It’s at the street level that Bing’s weirdness becomes most apparent. For example, below is part of Bing’s map of Banqiao.

I added the highlighting.

click for larger map

This tiny but representative fragment of the map has not one but four romanization systems:

  • MPS2: Gung Guang, Min Chiuan, Shin Fu (Even within MPS2, none of those should have spaces or extra capital letters.)
  • Hanyu Pinyin: Banqiao (This is the only properly written place name on this map fragment.)
  • Tongyong Pinyin: Jhancian, Sianmin, Sin Jhan
  • Gwoyeu Romatzyh(!): Shinjann (This is the same road as the one marked “Sin Jhan”. In Hanyu Pinyin, which is what officially should be used here, this is written “Xinzhan”.)

A few more points about this small fragment of the map:

  • Wen Hua could be either MPS2 or Hanyu Pinyin, but not Tongyong Pinyin. And it should be Wenhua.
  • Minan is missing an apostrophe. (It should be Min’an.)
  • Banchiao is just wrong, regardless of the system. They were probably going for MPS2 but erroneously used an o instead of a u: Banchiau.
  • Sec 1 Rd should be Rd Sec. 1.
  • Mrt should be MRT.

So that’s four systems, plus additional errors.

There’s much, much more that’s wrong with this than is right. That’s even more evident on a larger map — and that’s without me bothering to mark orthographic problems in the Pinyin (e.g., Wen Hua instead of the correct Wenhua).
click for larger view

Here bastardized Wade-Giles (e.g., “Mrt-Hsinpu” at top, center — and, FWIW, in the wrong location) has been added to the mix, making a total of five different romanization systems, as well as some weird spellings, e.g., U Nung, Win De, Bah De, Ying Sh — and that’s without including my favorite, JRLE, because that one is correct in MPS2 (“Zhile” in Hanyu Pinyin).

The main point is that vast majority of names are spelled wrong. And among the few that are spelled correctly, those that are written with correct orthography can be counted on one hand. So, to the words above (incompetent and atrocious) let me add FUBAR.

The copyright statement lists not only Microsoft but also Navteq. The Taiwan maps on the latter company’s site, however, are different from those on Bing. Navteq’s are generally in Hanyu Pinyin, though almost invariably improperly written (e.g., Tai bei Shi, Ban Qiao Shi). And despite the prevalence of Hanyu Pinyin, they still contain other romanization systems (e.g., Jhong Shan) and outright errors (e.g., Shin Jahn).

So an update from Navteq wouldn’t be nearly enough to fix Bing’s problems, which are fundamental.