High-performance bilingual text alignment using statistical and dictionary information

This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there is a limitation on the amount of word correspondences that can be statistically acquired. The main reason for this is the systems of functional (closed) words are quite different in the two languages. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word correspondences that are statistically acquired in the alignment process. Our method gradually determines sentence pairs (anchors) that correspond to each other by relaxing parameters. The method, by combining two kinds of word correspondences, achieves adequate word correspondences for complete alignment. As a result, texts of various length and of various genres in structurally different languages can be aligned with high precision. Experimental results show our system outperforms conventional methods for various kinds of Japanese–English texts.

[1]  Hiroyuki Kaji,et al.  Learning Translation Templates From Bilingual Text , 1992, COLING.

[2]  Kenneth Ward Church,et al.  K-vec: A New Approach for Aligning Parallel Texts , 1994, COLING.

[3]  Pascale Pung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL 1995.

[4]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[5]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[6]  Martin Kay,et al.  Text-Translation Alignment , 1993, Comput. Linguistics.

[7]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[8]  Yuji Matsumoto,et al.  Bilingual Text, Matching using Bilingual Dictionary and Statistics , 1994, COLING.

[9]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[10]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[11]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[12]  Satoru Ikehara,et al.  Learning Bilingual Collocations by Word-Level Sorting , 1996, COLING.

[13]  S. Sato Toward Memorybased Translation , 1990, COLING 1990.

[14]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[15]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[16]  Satoshi Sato,et al.  CTM: An Example-Based Translation Aid System , 1992, COLING.

[17]  Dekai Wu,et al.  Aligning a Parallel English-Chinese Corpus Statistically With Lexical Criteria , 1994, ACL.

[18]  Yuji Matsumoto,et al.  Sructural Matching of Parallel Texts , 1993, ACL.

[19]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[20]  Stanley F. Chen,et al.  Aligning Sentences in Bilingual Corpora Using Lexical Information , 1993, ACL.

[21]  Kathleen McKeown,et al.  Translating Collocations for Use in Bilingual Lexicons , 1994, HLT.

[22]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[23]  Pascale Fung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[24]  Kenneth Ward Church Char_align: A Program for Aligning Parallel Texts at the Character Level , 1993, ACL.

[25]  Hideki Hirakawa,et al.  Building An MT Dictionary From Parallel Texts Based On Linguistic And Statistical Information , 1994, COLING.