Using RBMT Systems to Produce Bilingual Corpus for SMT

This paper proposes a method using the existing Rule-based Machine Translation (RBMT) system as a black box to produce synthetic bilingual corpus, which will be used as training data for the Statistical Machine Translation (SMT) system. We use the existing RBMT system to translate the monolingual corpus into synthetic bilingual corpus. With the synthetic bilingual corpus, we can build an SMT system even if there is no real bilingual corpus. In our experiments using BLEU as a metric, the system achieves a relative improvement of 11.7% over the best RBMT system that is used to produce the synthetic bilingual corpora. We also interpolate the model trained on a real bilingual corpus and the models trained on the synthetic bilingual corpora. The interpolated model achieves an absolute improvement of 0.0245 BLEU score (13.1% relative) as compared with the individual model trained on the real bilingual corpus.

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  ANDY WAY,et al.  Comparing example-based and statistical machine translation , 2005, Nat. Lang. Eng..

[3]  Tadashi Nomoto Multi-Engine Machine Translation with Voted Language Model , 2004, ACL.

[4]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[5]  Chris Callison-Burch,et al.  Bootstrapping Parallel Corpora , 2003, ParallelTexts@NAACL-HLT.

[6]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[7]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[8]  Hermann Ney,et al.  Construction of a Hierarchical Translation Memory , 2000, COLING.

[9]  Hermann Ney,et al.  Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment , 2006, EACL.

[10]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[11]  Sergei Nirenburg,et al.  Three Heads are Better than One , 1994, ANLP.

[12]  Nizar Habash,et al.  Challenges in Building an Arabic-English GHMT System with SMT Components , 2006, AMTA.

[13]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[14]  Alon Lavie,et al.  Multi-engine machine translation guided by explicit word matching , 2005, EAMT.

[15]  Stephanie Seneff,et al.  Combining Linguistic and Statistical Methods for Bi-directional English Chinese Translation in the Flight Domain , 2006 .

[16]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[19]  Nicola Ueffing,et al.  Using monolingual source-language data to improve MT performance , 2006, IWSLT.

[20]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[21]  Andy Way,et al.  Hybridity in MT. Experiments on the Europarl Corpus , 2006, EAMT.

[22]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[23]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[24]  M. Carl,et al.  Reversible Template-based Shake & Bake Generation , 2005, MTSUMMIT.

[25]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[26]  Andy Way,et al.  Multi-Engine Machine Translation by Recursive Sentence Decomposition , 2006 .

[27]  Chris Quirk,et al.  Dependency treelet translation: the convergence of statistical and example-based machine-translation? , 2006, MTSUMMIT.

[28]  Daniel Marcu,et al.  Towards a Unified Approach to Memory- and Statistical-Based Machine Translation , 2001, ACL.

[29]  Stella Markantonatou,et al.  METIS-II: Machine Translation for Low Resource Languages , 2006, LREC.