Sentence Alignment of Historical Classics based on Mode Prediction and Term Translation Pairs

Parallel corpora are essential resources for the construction of bilingual term dictionary of historical classics. To obtain large-scale parallel corpora, this paper proposes a sentence alignment method based on mode prediction and term translation pairs. On one hand, the method rebuilds the sentence alignment process according to characteristics of the translation of historical classics, and adds mode prediction into the sentence alignment. On the other hand, due to the lack of bilingual ancient Chinese dictionary, the method exploits the term translation pairs extracted from manually aligned sentence pairs to perform alignment. The method first predicts the alignment mode probability according to the character number, punctuation number and some characters of Chinese sentence, then performs sentence alignment using length alignment probability, term alignment probability and mode probability. Besides, the method selects anchor sentence pairs based on sentence length and predicted mode to prevent the spread of alignment errors. The experiment on ”Shi Ji” demonstrates that mode prediction and term translation pair both enhance the performance of sentence alignment obviously.