Discriminative Training of 150 Million Translation Parameters and Its Application to Pruning

Until recently, the application of discriminative training to log linear-based statistical machine translation has been limited to tuning the weights of a limited number of features or training features with a limited number of parameters. In this paper, we propose to scale up discriminative training of (He and Deng, 2012) to train features with 150 million parameters, which is one order of magnitude higher than previously published effort, and to apply discriminative training to redistribute probability mass that is lost due to model pruning. The experimental results confirm the effectiveness of our proposals on NIST MT06 set over a strong baseline.

[1]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[2]  Hwee Tou Ng,et al.  Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms , 2008, EMNLP.

[3]  Li Deng,et al.  Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[4]  David Chiang,et al.  Two Easy Improvements to Lexical Weighting , 2011, ACL.

[5]  Chris Dyer,et al.  Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT , 2012, ACL.

[6]  Bowen Zhou,et al.  Prior Derivation Models For Formally Syntax-Based Translation Using Linguistically Syntactic Parsing and Tree Kernels , 2008, SSST@ACL.

[7]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[10]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[11]  Hae-Chang Rim,et al.  Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation , 2012, ACL.

[12]  Alexander H. Waibel,et al.  Translation Model Pruning via Usage Statistics for Statistical Machine Translation , 2007, HLT-NAACL.

[13]  Turchi Marco,et al.  How Good Are Your Phrases? Assessing Phrase Quality with Single Class Classification , 2011 .

[14]  Alexandre Allauzen,et al.  How good are your phrases? Assessing phrase quality with single class classification , 2011, IWSLT.

[15]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[16]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[17]  Peng Xu,et al.  A Systematic Comparison of Phrase Table Pruning Techniques , 2012, EMNLP.