DCU-Lingo24 Participation in WMT 2014 Hindi-English Translation task

This paper describes the DCU-Lingo24 submission to WMT 2014 for the HindiEnglish translation task. We exploit miscellaneous methods in our system, including: Context-Informed PB-SMT, OOV Word Conversion (OWC), MultiAlignment Combination (MAC), Operation Sequence Model (OSM), Stemming Align and Normal Phrase Extraction (SANPE), and Language Model Interpolation (LMI). We also describe various preprocessing steps we tried for Hindi in this task.

[1]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[2]  Andy Way,et al.  DCU Terminology Translation System for Medical Query Subtask at WMT14 , 2014, WMT@ACL.

[3]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[4]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Ananthakrishnan Ramanathan,et al.  A Lightweight Stemmer for Hindi , 2003 .

[10]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[11]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[12]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[13]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[14]  Yifan He,et al.  Combining Multiple Alignments to Improve Machine Translation , 2012, COLING.

[15]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[18]  Andy Way,et al.  Source-Side Suffix Stripping for Bengali-to-English SMT , 2012, 2012 International Conference on Asian Language Processing.

[19]  Vincent Ng,et al.  Unsupervised morphological parsing of Bengali , 2006, Lang. Resour. Evaluation.

[20]  Dipti Misra Sharma,et al.  Dependency Annotation Scheme for Indian Languages , 2008, IJCNLP.

[21]  Ondrej Bojar,et al.  Data Issues in English-to-Hindi Machine Translation , 2010, LREC.

[22]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[23]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..