Neural network-based reranking model for statistical machine translation

The non-local feature always plays an important role in improving performance of SMT. Nonlinear neural network model can take better advantage of non-local features to improve the performance of translation through the introduction of the hidden layer. So this paper will build reranking models based on neural network to make use of non-local features to improve the translation performance. In this paper, we will introduce two models: Reranker-WC and Reranker-D. Compared with performance of the baseline system, the performance of Reranker-WC can be promoted to about 1.4 BLEU score. Moreover, we find that different hyper-parameter λ will also affect the quality of SMT output at the same time. We achieve the best performance while λ is 40.

[1]  Kevin Duh,et al.  Beyond Log-Linear Models: Boosted Minimum Error Rate Training for N-best Re-ranking , 2008, ACL.

[2]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[3]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[4]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[5]  François Yvon,et al.  Non-linear n-best List Reranking with Few Features , 2012, AMTA.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Tetsuji Nakagawa,et al.  Multilingual Dependency Parsing Using Global Features , 2007, EMNLP.

[8]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[9]  Michael Collins,et al.  Morphology and Reranking for the Statistical Parsing of Spanish , 2005, HLT.

[10]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[11]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[12]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[13]  Yuan Dong,et al.  Using Non-Local Features to Improve Named Entity Recognition Recall , 2007, PACLIC.

[14]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[16]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[19]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[20]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[21]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..