Improving Pivot-Based Statistical Machine Translation Using Random Walk

This paper proposes a novel approach that utilizes a machine learning method to improve pivot-based statistical machine translation (SMT). For language pairs with few bilingual data, a possible solution in pivot-based SMT using another language as a "bridge" to generate source-target translation. However, one of the weaknesses is that some useful sourcetarget translations cannot be generated if the corresponding source phrase and target phrase connect to different pivot phrases. To alleviate the problem, we utilize Markov random walks to connect possible translation phrases between source and target language. Experimental results on European Parliament data, spoken language data and web data show that our method leads to significant improvements on all the tasks over the baseline system.

[1]  Hailong Cao,et al.  The NICT/ATR speech translation system for IWSLT 2008 , 2007, IWSLT.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  P. Štěpánek,et al.  Combination , 1902, Definitions.

[4]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[5]  Qun Liu,et al.  Reducing SMT Rule Table with Monolingual Key Phrase , 2009, ACL/IJCNLP.

[6]  Tiejun Zhao,et al.  Phrase Table Combination Deficiency Analyses in Pivot-Based SMT , 2013, NLDB.

[7]  Marta R. Costa-jussà,et al.  Enhancing scarce-resource language translation through pivot combinations , 2011, IJCNLP.

[8]  Ying Zhang,et al.  Mining translations of OOV terms from the web through cross-lingual query expansion , 2005, SIGIR '05.

[9]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[12]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[13]  Rayleigh The Problem of the Random Walk , 1905, Nature.

[14]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Zheng-Yu Niu,et al.  The TCH machine translation system for IWSLT 2008 , 2008, IWSLT.

[17]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[18]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[19]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[20]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[21]  Kevin Duh,et al.  Generalized Minimum Bayes Risk System Combination , 2011, IJCNLP.

[22]  Andrew Y. Ng,et al.  Learning random walk models for inducing word dependency distributions , 2004, ICML.

[23]  Hua Wu,et al.  Revisiting Pivot Language Approach for Machine Translation , 2009, ACL.

[24]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.