论文信息 - Neural Networks Classifier for Data Selection in Statistical Machine Translation

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Abstract Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.

[1] Nadir Durrani,et al. Domain adaptation using neural network joint model , 2017, Comput. Speech Lang..

[2] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[4] Fei Huang,et al. Semi-supervised Convolutional Networks for Translation Adaptation with Tiny Amount of In-domain Data , 2016, CoNLL.

[5] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .

[6] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[7] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[8] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9] Hermann Ney,et al. Combining translation and language model scoring for domain-specific data filtering , 2011, IWSLT.

[10] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] George F. Foster,et al. Bilingual Methods for Adaptive Training Data Selection for Machine Translation , 2016, AMTA.

[12] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.

[15] Anthony Rousseau,et al. XenC: An Open-Source Tool for Data Selection in Natural Language Processing , 2013, Prague Bull. Math. Linguistics.

[16] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18] Kevin Duh,et al. Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation , 2013, ACL.

[19] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[20] Holger Schwenk,et al. Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[21] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[22] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[24] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[25] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[26] Khalil Sima'an,et al. UvA-DARE ( Digital Academic Repository ) Latent Domain Translation Models in Mix-of-Domains Haystack , 2014 .

[27] Jörg Tiedemann,et al. News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[28] Jianfeng Gao,et al. Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[29] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[32] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.

[33] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[34] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[35] Tapani Raiko,et al. Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[36] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.