table-free CSS method for interlinear texts on Web pages

The interlinear version of the Scriptures is the prototype or ideal of all translation.
— Walter Benjamin

Hebrew-English interlinear text of part of Genesis 11 (Tower of Babel)

Interlinear texts are probably familiar to most who have studied a foreign language. Interlinear texts on the Web, however, tend to be in the form of tables. And, like most other fans of CSS, I tend to cringe at the word “table.” Moreover, text within tables doesn’t wrap to different window sizes.

I am generally opposed to the practice of displaying texts in both pinyin and Chinese characters interlinearly as opposed to en face. Pinyin was not designed to be an annotation system for Chinese characters but to be a full writing system (orthography) for modern Mandarin. Many if not most people, however, are misinformed about this basic point. Consequently, I try to avoid presenting pinyin in a way that could reinforce the mistaken notion that it is a supplement to characters rather than an independent system. Nevertheless, I recognize that interlinear texts can be useful in some circumstances. Moreover, perhaps others can make less problematic use of an interlinear technique for displaying other languages and scripts.

About six months ago I started to work out a standards-compliant, table-free method for displaying Chinese characters and pinyin interlinearly on Web pages. As is so often the case, once I figured out the basics I became distracted by something else and never finished. A recent request for a way to display ruby text with pinyin, however, has prompted me to present some of my ideas on this in case others might find them useful and produce something with them. And, at any rate, CSS3’s ruby text feature isn’t likely to be implemented by the major browsers anytime soon.

The fundamental approach of the method I recommend is to put individual words/phrases and their pinyin/character equivalent in floated div tags and use CSS to make everything look right. Unfortunately, the method isn’t semantically correct because it uses div and p tags for individual words rather than true blocks of text; but I don’t see that as a big enough problem to resort to the trouble of putting all this into xml. YMMV.

This is adapted from a thumbnail-captioning method detailed on A List Apart.

Floated elements, of course, need to have declared widths. But this gets tricky because words are of various widths. It’s not enough, either, to set widths based on the number of letters or Chinese characters within a block, because the question of width is complicated.

The five-letter syllable “chong,” for example, is wider than the five-letter “liang” because the letters l and i are thinner than any of the letters in chong — at least in most fonts. And the widths of pinyin elements do not correspond to the widths of Chinese characters.

With Chinese characters the situation is for the most part different. Note that 哩哩啦啦 and 爽爽快快 take the same amount of horizontal space to write:

哩哩啦啦
爽爽快快

The same, however, is not true of their Pinyin equivalents:

līlīlālā
shuǎngshuǎngkuàikuài

One way to deal with this is “headline counting,” which is an old method copy editors use to help make headlines fit within alloted spaces. Under this system, letters, numbers, and punctuation marks are given different values, based on their approximate width. Here are the values under one headline-counting method:

count value applicable letters, numbers, punctuation marks
0.5 flitj.,:;!
1.0 abcdeghknopqrsuvxyz[space]I1-[vowels, including i, with tone marks]
1.5 mwABCDEFGHJKLNOPQRSTUVXYZ234567890$?
2.0 MW[em dash]

Thus, “pinyin” would have a count of 5, but “Pinyin” would have a count of 5.5. And “Hanyu Pinyin” would have a count of 12.

To have the text spaced as attractively as possible, counts would also need to be performed for the Chinese characters and then checked against the count for the romanized text to make sure the larger value is used. This is because counts for Pinyin words could result in widths being set smaller than required, such as in the case of lí’è, which is thinner than 罹厄 unless the characters are made to be unusually small relative to the romanization. Deriving a count for the width of Chinese characters, however, is easy, because in most cases they can safely be treated as if they all took the same amount of horizontal space. The value assigned for the counting of Chinese characters would depend on how large you want to make them in relation to the pinyin.

Next, assign a CSS class to the relevant div. I’ve named the classes according to the counts (multiplied by 10). The base text goes inside a paragraph tag. Thus, to put “wèishénme” over “為什麼” would require the following code:


<div class="count95">
wèishénme

<p>為什麼</p>
</div>

The main thing requiring attention is coming up with the correct width for each class. In the CSS for this example, I’ve rounded up counts so that two different classes can have the same width. In a finished version, perhaps they should be given separate widths or the pairs of classes should be combined to make for simpler code.

Here’s the CSS:

   .interlinear div     {
        margin-right: 0.2em;    /* FOR THE SPACES BETWEEN WORDS */
        height: 4.0em;          /* TO KEEP LINES FROM OVERLAPPING */
        }
   .count20, .count25      {
        width: 1.5em;
        }
   .count30, .count35      {
        width: 2.0em;
        }
   .count40, .count45      {
        width: 2.5em;
        }
   .count50, .count55      {
        width: 3.0em;
        }
   .count60, .count65      {
        width: 3.2em;
        }
   .count70, .count75      {
        width: 3.5em;
        }
   .count80, .count85      {
        width: 4.0em;
        }
   .count90, .count95      {
        width: 4.5em;
        }


   .interlinear p   {
        font-size: 100%;
        margin-top: 0.3em;
        line-height: 1em;
        }


  /* ++++++++++++++++++++++ */
  /* the CSS below this point probably does not need to be adjusted */
  /* except to add more 'countXX' classes for longer words  */
  /* ++++++++++++++++++++++ */

   .interlinear div.spacer {
        clear: both;
        height: 0;
        }
  .count20, .count25, .count30, .count35, .count40, .count45, 
  .count50, .count55, .count60, .count65, .count70, .count75, 
  .count80, .count85, .count90, .count95 {
        float: left;
        text-align: center;
        }
  .interlinear p   {
        text-align: center;
        font-family: serif;
        font-size: 100%;
        }
   .interlinear {
        font-family: serif;
        font-size: 100%;
        }

Note the unfortunate but likely necessary use of spacer divs to separate paragraphs by clearing the floated elements. In the HTML these divs take the following form:

<div class="spacer">
&nbsp;
</div>

Here’s some of this in action:

Here’s some interlinear text with Pinyin above Chinese characters

 
Duìmiàn

對面

de

nǚhái

女孩

kàn

guòlai,

過來,

kàn

guòlai,

過來,

kàn

guòlai.

過來.

Zhèlǐ

這裡

de

biǎoyǎn

表演

hěn

jīngcǎi.

精彩.

Qǐng

bùyào

不要

jiǎzhuāng

假裝

bùlǐbùcǎi.

不理不睬.

 

Here’s some interlinear text with Chinese characters above Pinyin

 
對面

Duìmiàn

de

女孩

nǚhái

kàn

過來,

guòlai,

kàn

過來,

guòlai,

kàn

過來.

guòlai.

這裡

Zhèlǐ

de

表演

biǎoyǎn

hěn

精彩.

jīngcǎi.

Qǐng

不要

bùyào

假裝

jiǎzhuāng

不理不睬.

bùlǐbùcǎi.

 

So, does anyone have suggestions for improving this or know how to program a way to automate the process as much as possible?