Bilingual chunk alignment in statistical machine translation

In this paper a new algorithm called multilayer filtering (MLF) is proposed for extracting bilingual alignment chunks automatically from a Chinese-English parallel corpus. Multiple layers are used to extract bilingual chunks according to different features of chunks in the bilingual corpus. And the alignment chunks are one-to-one corresponding with each other. The chunking and alignment algorithm doesn't rely on the information from tagging, parsing, syntax analyzing or segmenting for Chinese corpus as most conventional algorithms do. Preliminary experimental results show that the algorithm achieves a good performance in chunking and alignment. Moreover, the translations generated by this algorithm are much better than the results generated by the baseline (word-based statistical machine translation).

[1]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[3]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[4]  Ye-Yi Wang,et al.  Grammar Inference and Statistical Machine Translation , 2001 .

[5]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[6]  Y. Zhang,et al.  Integrated phrase segmentation and alignment algorithm for statistical machine translation , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[7]  Andrew Roberts,et al.  Automatic Acquisition of Word Classification Using Distribution Analysis of Content Words with Respect to Function Words , 2002 .

[8]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[9]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[10]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[11]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[12]  Alexander H. Waibel,et al.  Modeling with Structures in Statistical Machine translation , 1998, ACL.