table-free CSS method for interlinear texts on Web pages

The interlinear version of the Scriptures is the prototype or ideal of all translation.
— Walter Benjamin

Hebrew-English interlinear text of part of Genesis 11 (Tower of Babel)

Interlinear texts are probably familiar to most who have studied a foreign language. Interlinear texts on the Web, however, tend to be in the form of tables. And, like most other fans of CSS, I tend to cringe at the word “table.” Moreover, text within tables doesn’t wrap to different window sizes.

I am generally opposed to the practice of displaying texts in both pinyin and Chinese characters interlinearly as opposed to en face. Pinyin was not designed to be an annotation system for Chinese characters but to be a full writing system (orthography) for modern Mandarin. Many if not most people, however, are misinformed about this basic point. Consequently, I try to avoid presenting pinyin in a way that could reinforce the mistaken notion that it is a supplement to characters rather than an independent system. Nevertheless, I recognize that interlinear texts can be useful in some circumstances. Moreover, perhaps others can make less problematic use of an interlinear technique for displaying other languages and scripts.

About six months ago I started to work out a standards-compliant, table-free method for displaying Chinese characters and pinyin interlinearly on Web pages. As is so often the case, once I figured out the basics I became distracted by something else and never finished. A recent request for a way to display ruby text with pinyin, however, has prompted me to present some of my ideas on this in case others might find them useful and produce something with them. And, at any rate, CSS3’s ruby text feature isn’t likely to be implemented by the major browsers anytime soon.

The fundamental approach of the method I recommend is to put individual words/phrases and their pinyin/character equivalent in floated div tags and use CSS to make everything look right. Unfortunately, the method isn’t semantically correct because it uses div and p tags for individual words rather than true blocks of text; but I don’t see that as a big enough problem to resort to the trouble of putting all this into xml. YMMV.

This is adapted from a thumbnail-captioning method detailed on A List Apart.

Floated elements, of course, need to have declared widths. But this gets tricky because words are of various widths. It’s not enough, either, to set widths based on the number of letters or Chinese characters within a block, because the question of width is complicated.

The five-letter syllable “chong,” for example, is wider than the five-letter “liang” because the letters l and i are thinner than any of the letters in chong — at least in most fonts. And the widths of pinyin elements do not correspond to the widths of Chinese characters.

With Chinese characters the situation is for the most part different. Note that 哩哩啦啦 and 爽爽快快 take the same amount of horizontal space to write:

哩哩啦啦
爽爽快快

The same, however, is not true of their Pinyin equivalents:

līlīlālā
shuǎngshuǎngkuàikuài

One way to deal with this is “headline counting,” which is an old method copy editors use to help make headlines fit within alloted spaces. Under this system, letters, numbers, and punctuation marks are given different values, based on their approximate width. Here are the values under one headline-counting method:

count value applicable letters, numbers, punctuation marks
0.5 flitj.,:;!
1.0 abcdeghknopqrsuvxyz[space]I1-[vowels, including i, with tone marks]
1.5 mwABCDEFGHJKLNOPQRSTUVXYZ234567890$?
2.0 MW[em dash]

Thus, “pinyin” would have a count of 5, but “Pinyin” would have a count of 5.5. And “Hanyu Pinyin” would have a count of 12.

To have the text spaced as attractively as possible, counts would also need to be performed for the Chinese characters and then checked against the count for the romanized text to make sure the larger value is used. This is because counts for Pinyin words could result in widths being set smaller than required, such as in the case of lí’è, which is thinner than 罹厄 unless the characters are made to be unusually small relative to the romanization. Deriving a count for the width of Chinese characters, however, is easy, because in most cases they can safely be treated as if they all took the same amount of horizontal space. The value assigned for the counting of Chinese characters would depend on how large you want to make them in relation to the pinyin.

Next, assign a CSS class to the relevant div. I’ve named the classes according to the counts (multiplied by 10). The base text goes inside a paragraph tag. Thus, to put “wèishénme” over “為什麼” would require the following code:


<div class="count95">
wèishénme

<p>為什麼</p>
</div>

The main thing requiring attention is coming up with the correct width for each class. In the CSS for this example, I’ve rounded up counts so that two different classes can have the same width. In a finished version, perhaps they should be given separate widths or the pairs of classes should be combined to make for simpler code.

Here’s the CSS:

   .interlinear div     {
        margin-right: 0.2em;    /* FOR THE SPACES BETWEEN WORDS */
        height: 4.0em;          /* TO KEEP LINES FROM OVERLAPPING */
        }
   .count20, .count25      {
        width: 1.5em;
        }
   .count30, .count35      {
        width: 2.0em;
        }
   .count40, .count45      {
        width: 2.5em;
        }
   .count50, .count55      {
        width: 3.0em;
        }
   .count60, .count65      {
        width: 3.2em;
        }
   .count70, .count75      {
        width: 3.5em;
        }
   .count80, .count85      {
        width: 4.0em;
        }
   .count90, .count95      {
        width: 4.5em;
        }


   .interlinear p   {
        font-size: 100%;
        margin-top: 0.3em;
        line-height: 1em;
        }


  /* ++++++++++++++++++++++ */
  /* the CSS below this point probably does not need to be adjusted */
  /* except to add more 'countXX' classes for longer words  */
  /* ++++++++++++++++++++++ */

   .interlinear div.spacer {
        clear: both;
        height: 0;
        }
  .count20, .count25, .count30, .count35, .count40, .count45, 
  .count50, .count55, .count60, .count65, .count70, .count75, 
  .count80, .count85, .count90, .count95 {
        float: left;
        text-align: center;
        }
  .interlinear p   {
        text-align: center;
        font-family: serif;
        font-size: 100%;
        }
   .interlinear {
        font-family: serif;
        font-size: 100%;
        }

Note the unfortunate but likely necessary use of spacer divs to separate paragraphs by clearing the floated elements. In the HTML these divs take the following form:

<div class="spacer">
&nbsp;
</div>

Here’s some of this in action:

Here’s some interlinear text with Pinyin above Chinese characters

 
Duìmiàn

對面

de

nǚhái

女孩

kàn

guòlai,

過來,

kàn

guòlai,

過來,

kàn

guòlai.

過來.

Zhèlǐ

這裡

de

biǎoyǎn

表演

hěn

jīngcǎi.

精彩.

Qǐng

bùyào

不要

jiǎzhuāng

假裝

bùlǐbùcǎi.

不理不睬.

 

Here’s some interlinear text with Chinese characters above Pinyin

 
對面

Duìmiàn

de

女孩

nǚhái

kàn

過來,

guòlai,

kàn

過來,

guòlai,

kàn

過來.

guòlai.

這裡

Zhèlǐ

de

表演

biǎoyǎn

hěn

精彩.

jīngcǎi.

Qǐng

不要

bùyào

假裝

jiǎzhuāng

不理不睬.

bùlǐbùcǎi.

 

So, does anyone have suggestions for improving this or know how to program a way to automate the process as much as possible?

Korean brands, images, and naming

Choe Yong-shik, the author of What’s Wrong With Korea’s Global Marketing, has some interesting comments on company names and branding in South Korea.

He notes that in 1992 the Korean company Samsung switched its logo, changing from using the Chinese characters 三星 to the Roman alphabet (with a stylized A):
Samsung logo

This, he says, is representative of a trend:

Since the 1990s, many companies have carried out similar corporate identity projects that have seen the gradual extinction of the practice of using Chinese character logos. Companies have increasingly leaned toward more appealing names in the Roman alphabet as a means to establish a global brand image.

Using Chinese characters as an international brand image in today’s global market is not only ineffective, but it also borders on silliness.

source: Samsung, LG’s Brand Globalization History, Korea Times, December 26, 2005

mobile phones, voice-recognition software, and Chinese characters

Speech recognition on cellphones is no longer about saying a name and then waiting and hoping that the right number is dialed, many experts say.

With most early versions, users trained their phones to understand commands. But the accuracy of the function in real-world use was sketchy at best and nearly zero if the training was too noisy.

Most new cellphones have voice-recognition software already included; on some others the software can be downloaded. With the most advanced software, users can dictate a text or e-mail message, find a calendar item on the phone or jump directly to a ring tone and buy it with a simple command like “Madonna ring tone.”

This last possibility is especially appealing for carriers, which have content on their mobile portals they are trying to sell clients, most of whom cannot be bothered to click through multiple menus to find what might interest them….

The most compelling market for voice-recognition software might be Asia, because typing ideograms [sic] on a cellular phone is more laborious than using a Western alphabet. Many companies, including NEC, are busily developing products.

source: Voice recognition enters new realm in cellphones, International Herald Tribune, December 26, 2005

icons — please vote

For a long time I’ve had making a “favorites icon” (“favicon,” for short) on the long to-do list for this site. These icons are small images, just 16 pixels by 16 pixels, that can appear in bookmarks for a Web site and in the address bar. In some browsers, such as Opera, they also appear on the browser tabs, which is a nice touch.

Probably the most common look for icons is achieved by incorporating a letter of the alphabet: YahooYahoo's icon -- a red Y with an exclamation mark , Google Google's icon: a large blue capital G , Opera Opera Web browser's icon: a large red shadowed O, the New York Times New York Times's icon -- an ornate T , Forumosa Forumosa's icon -- an F .

Some icons use Chinese characters: Wenlin Wenlin's icon: 'Wenlin' in Chinese characters , No-Sword Chinese character 'wu2' (without, nothingness); icon for the No-Sword blog .

And some are more abstract or pictorial: Notetab text editor Notetab text editor's icon: a white cross against a red background , the Panda’s Thumb The Panda's Thumb icon -- a tiny image of a panda, Photo Net Photo Net's icon -- an image of a camera .

This being the sort of site it is, I’m not going to use a Chinese character — not unless I could fit romanization in as well. And I doubt that can be done within a 16 by 16 square.

Ideally, I’d like to have something in the style of Xu Bing‘s “new English calligraphy.” Here’s roughly the effect I’d be shooting for:
the word 'pinyin' written in the style of Chinese characters, after the method of artist Xu Bing

(That’s “P-I-n-Y-I-n”, in case you’re wondering.)

Unfortunately, however, that sort of thing doesn’t work very well when reduced down to icon size. About the best I could come up with is this: icon for Pinyin Info . But I’m not so sure about that.

I’d like to get input from my readers. Which of the following do you prefer?

  1. — largely the same as no. 1
  2. — the P is light green
  3. — the P is white
  4. — faux Xu Bing
  5. other (please specifiy)

Please let me know what you think with a comment here or through e-mail.

If you have an image you’d like to use for your site’s icon but don’t have the software to turn it into icon format, you could try this online favicon generator. It will reduce your image to the correct size and put it in .ico format.

Then place the resulting image, which should be named favicon.ico for maximum browser compatibility, in the root directory of your site. To make Internet Explorer happy, you could also add the following to the head of your HTML:
<link rel="shortcut icon" href="/favicon.ico" />

In other Pinyin Info image news, I’ve added a script to the Pinyin Info home page that will put up random images and links to readings on this site. I hope it helps let people know that there’s a lot more on this site than might appear at first glance.

Finally, since logos and icons are often associated with “ideographs,” this seems like a good place to recommend John DeFrancis’s reading on the ideographic myth, for anyone who hasn’t read that already.

names, ethnicity, and colonialism

Joel at Far Outliers has an interesting post on how Koreans chose Japanese names during the Japanese colonial period. (Spotted on Language Hat.)

Regarding name frequency in Taiwan, I once did some checking of an old version of Chih-Hao Tsai’s invaluable list of Chinese names (in Taiwan) and ended up with the top ten names covering 50 percent of the population. Now that he’s got an improved name-list online, I should check again.

Also here in Taiwan, few aborigines have taken the trouble to change their official names, now that they finally have an alternative to the sinicized versions that had been forced upon them by Taiwan’s officialdom. It will be interesting to see how the situation changes, if at all, now that new national ID cards are finally being issued. For more on this, see Romanization to be allowed on some Taiwan ID cards, including the link in the note.

site to tout Taiwan’s English environment bad beyond belief

Taiwan is touting its “English living environment” with a “carnival” (i.e., a room with a bunch of booths from various government agencies and a few businesses, each with some display at least vaguely associated with English). Awards will be given; I wonder how many of them will be deserved.

Here’s a sample from the carnival Web site’s introduction to the “mascots” for the event: American boy

Hello! We are Mascot Profile, because theme of this year is (ENJOY TAIWAN), make us to be siblings organic to assemble to get together too. In the face of the change of the world environments, because the progress of science and technology makes the mutual distance shorten, so world has merged together already a village . In fact, Taiwan is the same too, because the plasticity of Taiwan is strong, accept degree high, already already like world village, there are various kinds of culture and characteristic . This kind of phenomenon is six of ours. Introduce myself by us right away now , see which one be most lovely!

This machine-translated monstrosity is nothing short of a disgrace.

As for why they need mascots, or why most of these represent people from countries where English is not the native language — that’s beyond me. Perhaps it’s to distract people from the disastrously bad English.

For anyone who would like to attend and perhaps get to see how the “original flavor in Taiwan of Israel opens the prelude, will praise 33 excellent organs,”* the event opens today (Tuesday, December 20) at Taipei 101. For details and more atrocious English, see the Web site for the 2005 English Carnival.

* No, I didn’t make that up either.

Pinyin Info in the New York Times

Pinyin Info made the Reading File of this Sunday’s New York Times, with Victor H. Mair’s essay danger + opportunity ? crisis being quoted:

On pinyin.info, a Web site about the Chinese language, Victor H. Mair, a professor of Chinese at the University of Pennsylvania, explodes the myth that “crisis,” in Chinese means both “danger” and “opportunity.”

A whole industry of pundits and therapists has grown up around this one grossly inaccurate formulation. A casual search of the Web turns up more than a million references to this spurious proverb. It appears, … often complete with Chinese characters, on the covers of books, on advertisements for seminars, on expensive courses for “thinking outside of the box” and practically everywhere one turns in the world of quick-buck business, pop psychology, and orientalist hocus-pocus. …

Like most Mandarin words, that for “crisis” (weiji) consists of two syllables that are written with two separate characters, wei and ji. The ji of weiji, in fact, means something like “incipient moment; crucial point (when something begins or changes).” Thus, a weiji is indeed a genuine crisis, a dangerous moment, a time when things start to go awry. A weiji indicates a perilous situation when one should be especially wary. It is not a juncture when one goes looking for advantages and benefits. In a crisis, one wants above all to save one’s skin and neck!

source: By Any Other Name, New York Times, December 18, 2005