A swarm-inspired re-ranker system for statistical machine translation

Abstract Recently, re-ranking algorithms have been successfully applied on statistical machine translation systems. Due to the errors in the hypothesis alignment and varying word order between the source and target sentences and also the lack of sufficient resources such as parallel corpora, decoding may result in ungrammatical or non-fluent outputs. This paper proposes a re-ranking system based on swarm algorithms, which makes the use of sophisticated non-syntactical features to re-rank the n-best translation candidates. We introduce plenty of easy-computed non-syntactical features to deal with SMT system errors plus the quantum-behaved particle swarm optimization (QPSO) algorithm to adjust the weights of features. We have evaluated the proposed approach on 2 translation tasks in different language pairs (Persian → English and German → English) and genres (news and novel books). In comparison with PSO-, GA-, Perceptron- and averaged Perceptron-style re-ranking systems, the experimental study demonstrates the superiority of the proposed system in terms of translation quality on both translation tasks. In addition, the impacts of the proposed features on the translation quality have been analyzed, and the most positive ones have been recognized. At the end, the impact of the n-best list size on the proposed system is investigated.

[1]  Christof Monz,et al.  Syntactic discriminative language model rerankers for statistical machine translation , 2011, Machine Translation.

[2]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[3]  Kevin Duh,et al.  Distributed Minimum Error Rate Training of SMT using Particle Swarm Optimization , 2011, IJCNLP.

[4]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[7]  Jing Liu,et al.  Parameter Selection of Quantum-Behaved Particle Swarm Optimization , 2005, ICNC.

[8]  Marcello Federico,et al.  A word-to-phrase statistical translation model , 2005, TSLP.

[9]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[10]  Tong Zhang,et al.  A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[11]  José B. Mariño,et al.  Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output , 2006, WMT@HLT-NAACL.

[12]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[13]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[14]  Wenbo Xu,et al.  Particle swarm optimization with particles having quantum behavior , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[15]  Kenji Yamada,et al.  Reranking for Large-Scale Statistical Machine Translation , 2008 .

[16]  Wenbo Xu,et al.  Adaptive parameter control for quantum-behaved particle swarm optimization on individual level , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[17]  Marc Dymetman,et al.  Learning Machine Translation , 2010 .

[18]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[19]  A. Mansouri,et al.  State-of-the-art English to Persian Statistical Machine Translation system , 2012, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012).

[20]  Sivaji Bandyopadhyay,et al.  Emerging Applications of Natural Language Processing: Concepts and New Research , 2012 .

[21]  Stephan Vogel,et al.  Combination of Machine Translation Systems via Hypothesis Selection from Combined N-Best Lists , 2008, AMTA 2008.

[22]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[23]  Jun Sun,et al.  A global search strategy of quantum-behaved particle swarm optimization , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[24]  Xiaojun Wu,et al.  Convergence analysis and improvements of quantum-behaved particle swarm optimization , 2012, Inf. Sci..

[25]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[26]  Shujian Huang,et al.  Segmenting Long Sentence Pairs for Statistical Machine Translation , 2009, 2009 International Conference on Asian Language Processing.

[27]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Christof Monz,et al.  Discriminative syntactic reranking for statistical machine translation , 2010 .