Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014

This paper describes the Nara Institute of Science and Technology’s (NAIST) submission to the 2014 Workshop on Asian Translation’s four translation tasks. All systems are based on forest-to-string (F2S) translation, in which the input sentence is first parsed using a syntactic parser, then a forest of possible syntactic analyses is translated into the target language. In addition to the baseline F2S system, we add rescoring using a recurrent neural network language model (RNNLM), which allows for more fluent output. The resulting system achieved the highest results in both automatic and manual evaluation for all four of the language pairs targeted by the workshop.

[1]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[2]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[3]  Graham Neubig,et al.  Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis , 2011, ACL.

[4]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5]  Tomoki Toda,et al.  Discriminative Language Models as a Tool for Machine Translation Error Analysis , 2014, COLING.

[6]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[7]  Shinsuke Mori,et al.  A Japanese Word Dependency Corpus , 2014, LREC.

[8]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[9]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[10]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[11]  Qun Liu,et al.  Forest-Based Translation , 2008, ACL.

[12]  Eiichiro Sumita,et al.  Overview of the 1st Workshop on Asian Translation , 2014, WAT.

[13]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[16]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[17]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[18]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[19]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[21]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[22]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[23]  Kevin Duh,et al.  On the Elements of an Accurate Tree-to-String Machine Translation System , 2014, ACL.

[24]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[25]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[26]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[27]  Daniel Marcu,et al.  Hierarchical Search for Word Alignment , 2010, ACL.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.