Neural Probabilistic Language Model for System Combination

This paper gives the system description of the neural probabilistic language modeling (NPLM) team of Dublin City University for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12). We used the information obtained by NPLM as meta information to the system combination module. For the Spanish-English data, our paraphrasing approach achieved 25.81 BLEU points, which lost 0.19 BLEU points absolute compared to the standard confusion network-based system combination. We note that our current usage of NPLM is very limited due to the difficulty in combining NPLM and system combination.

[1]  Andy Way,et al.  Multi-Word Expression-Sensitive Word Alignment , 2010 .

[2]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[3]  Marie-Francine Moens,et al.  The latent words language model , 2012, Comput. Speech Lang..

[4]  Ronan Collobert,et al.  Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[5]  Andy Way,et al.  Using TERp to Augment the System Combination for SMT , 2010, AMTA.

[6]  Josef van Genabith,et al.  Minimum Bayes Risk Decoding with Enlarged Hypothesis Space in System Combination , 2012, CICLing.

[7]  Yee Whye Teh,et al.  Lossless Compression Based on the Sequence Memoizer , 2010, 2010 Data Compression Conference.

[8]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[9]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[10]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[11]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[14]  Holger Schwenk,et al.  Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[15]  Tommi S. Jaakkola,et al.  Approximate inference in graphical models using lp relaxations , 2010 .

[16]  Andy Way,et al.  Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation , 2010, WAPA.

[17]  Tsuyoshi Okita,et al.  Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner , 2012, LREC.

[18]  J. Bresnan Lexical-Functional Syntax , 2000 .

[19]  Andy Way,et al.  Evaluating machine translation with LFG dependencies , 2007, Machine Translation.

[20]  John DeNero,et al.  Fast Consensus Decoding over Translation Forests , 2009, ACL.

[21]  Jason Weston,et al.  Towards Open-Text Semantic Parsing via Multi-Task Learning of Structured Embeddings , 2011, ArXiv.

[22]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[23]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[24]  Giuseppe Riccardi,et al.  Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[25]  Andy Way,et al.  An Incremental Three-pass System Combination Framework by Combining Multiple Hypothesis Alignment Methods , 2010, Int. J. Asian Lang. Process..

[26]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[27]  Tony Veale,et al.  Exploding The Creativity Myth: The Computational Foundations of Linguistic Creativity , 2012 .

[28]  Holger Schwenk,et al.  Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[29]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[30]  Josef van Genabith,et al.  Topic Modeling-based Domain Adaptation for System Combination , 2012, AML@COLING.

[31]  Andy Way,et al.  Pitman-Yor Process-Based Language Models for Machine Translation , 2011, Int. J. Asian Lang. Process..

[32]  Andy Way,et al.  Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-Based Translation Model Smoothing , 2011, FLAIRS.

[33]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[34]  Josef van Genabith,et al.  Sentence-Level Quality Estimation for MT System Combination , 2012, AML@COLING.

[35]  Tsuyoshi Okita,et al.  Data Cleaning for Word Alignment , 2009, ACL.

[36]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[37]  Tsuyoshi Okita DCU Confusion Network-based System Combination for ML 4 HMT , 2011 .

[38]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[39]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[40]  Andy Way,et al.  Hierarchical Pitman-Yor Language Model for Machine Translation , 2010, 2010 International Conference on Asian Language Processing.

[41]  Andy Way,et al.  MaTrEx: The DCU MT System for WMT 2008 , 2008, WMT@ACL.

[42]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[43]  Hermann Ney,et al.  Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment , 2006, EACL.