Combining Pinyin and Chinese character subtitles

With any luck, this will be the last post for some time in my none too exciting but hopefully useful series on technical aspects of creating Pinyin subtitles.

Some people like to have Pinyin subtitles and Hanzi subtitles appear at the same time. Although I think that’s generally a bad idea (too much text to get through quickly that way, people would benefit from becoming accustomed to reading Pinyin texts as Pinyin texts, etc.), I’ll go ahead and offer instructions on how to make Pinyin subtitles appear above Chinese character subtitles.

These directions are for Microsoft Word, though other programs could be used instead.

Using Word, open copies of the two subtitle files you’d like to combine.

To get the alignment between the two files to match when they’re combined, it’s important that each subtitle entry is only one line long. You can check for possible instances of multi-line subtitles with a wildcard search (CTRL+H –> More –> Use wildcards).

Find what (with “Use wildcards” checked):
([!0-9])^13([!0-9^13])

If that search finds any multi-line subtitles, you’ll need to temporarily adjust those lines in both subtitle files, as follows:

Find what (with “Use wildcards” checked):
([!0-9])^13([!0-9^13])

Replace with:
\1|\2

Again, be sure to run that search-and-replace in both subtitle files. You’ll replace the “|” with a RETURN later.

Next, in the file with the Chinese characters (not the Pinyin file) strip out everything except for the text of the subtitles, leaving just the Hanzi text. (I wrote about this earlier in How to strip subtitle files down to text. The method is also useful for removing such information if you want to create the text of the screenplay.)

Find what (with “Use wildcards” checked):
^13[0-9:\,\-\> ]{1,}^13

Replace with:
^p

Note: You may need to run the above “replace all” twice for Word to catch everything.

You should have something that looks like this (with paragraph marks shown):


?! ????¶

????¶

??¶

??¶

????????¶

Now add extra lines, so the lines with Chinese characters will fit into the new document in the correct places.

Find what (with “Use wildcards” checked):
^13^13

Replace with:
^p^p^p^p^p

Delete the very first line — the one with the “1″ in it. Then add three blank lines above this.

You should have something that looks like this (with paragraph marks shown):




?! ????¶




????¶




??¶

Select all (CTRL+A). Then convert this to a table:
Table –> Convert –> Text to Table

Now switch to the Pinyin subtitles file.

First, add the extra lines blank lines into which you will later insert the Chinese characters that correspond with the Pinyin.

Find what (with “Use wildcards” checked):
^13^13

Replace with:
^p^p^p

Convert the Pinyin subtitles to a table:
CTRL+A
Table –> Convert –> Text to Table

Switch back to the Chinese character file. Copy the table there and paste it to the right of the table with the Pinyin text.

You should have something that looks like this:

1  
00:00:49,000 –> 00:00:51,500  
Yō! Lǐ yé lái la  
  喲! 李爺來啦
   
2  
00:00:52,200 –> 00:00:53,600  
Lǐ yé lái la  
  李爺來啦
   
3  
00:01:06,900 –> 00:01:08,400  
Xiùlián  
  秀蓮
   
4  
00:01:09,000 –> 00:01:10,400  
Xiùlián  
  秀蓮

Next, change this back into text:
Table –> Convert –> Table to Text

Remove the tabs:
Find what:
^t

Replace with:
[leave blank]

If you combined any lines earlier, break them apart now:
Find what:
|

Replace with:
^p

Your document should now look like this:

1
00:00:49,000 –> 00:00:51,500
Y?! L? yé lái la
?! ????

2
00:00:52,200 –> 00:00:53,600
L? yé lái la
????

3
00:01:06,900 –> 00:01:08,400
Xiùlián
??

4
00:01:09,000 –> 00:01:10,400
Xiùlián
??

Save the file as plain text (*.txt), not as a Word document (*.doc). Then later rename this to give it the correct file extension (probably *.srt).

See also:

2 thoughts on “Combining Pinyin and Chinese character subtitles

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>