论文信息 - A Data Mining Approach to Learn Reorder Rules for SMT

A Data Mining Approach to Learn Reorder Rules for SMT

In this paper, we describe a syntax based source side reordering method for phrase-based statistical machine translation (SMT) systems. The source side training corpus is first parsed, then reordering rules are automatically learnt from source-side phrases and word alignments. Later the source side training and test corpus are reordered and given to the SMT system. Reordering is a common problem observed in language pairs of distant language origins. This paper describes an automated approach for learning reorder rules from a word-aligned parallel corpus using association rule mining. Reordered and generalized rules are the most significant in our approach. Our experiments were conducted on an English-Hindi EILMT corpus.

Avinesh PVS

[1] Christian Borgelt,et al. An implementation of the FP-growth algorithm , 2005 .

[2] Ramakrishnan Srikant,et al. Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[3] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[4] Kevin Knight,et al. A Syntax-based Statistical Translation Model , 2001, ACL.

[5] Kenji Yamada,et al. Syntax-based language models for statistical machine translation , 2003, ACL 2003.

[6] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7] Matt Post,et al. Syntax-based language models for statistical machine translation , 2010 .

[8] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[9] Marta R. Costa-jussà,et al. Statistical Machine Reordering , 2006, EMNLP.

[10] Fei Xia,et al. Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[11] Philipp Koehn,et al. Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[12] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[13] Alfred V. Aho,et al. The Theory of Parsing, Translation, and Compiling , 1972 .

[14] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.