Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation

Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications. We propose a bidirectional recurrent neural network based approach to extract parallel sentences from collections of multilingual texts. Our experiments with noisy parallel corpora show that we can achieve promising results against a competitive baseline by removing the need of specific feature engineering or additional external resources. To justify the utility of our approach, we extract sentence pairs from Wikipedia articles to train machine translation systems and show significant improvements in translation performance.

[1]  Josef van Genabith,et al.  An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification , 2017, IEEE Journal of Selected Topics in Signal Processing.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4]  Pabitra Mitra,et al.  Bilingual Word Embeddings with Bucketed CNN for Parallel Sentence Extraction , 2017, ACL.

[5]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .

[6]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[7]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[8]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[9]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[10]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[11]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[12]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[15]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[16]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[19]  Holger Schwenk,et al.  On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[20]  Kristina Toutanova,et al.  Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment , 2010, NAACL.

[21]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[24]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[25]  Chenhui Chu,et al.  Parallel Sentence Extraction from Comparable Corpora with Neural Network Features , 2016, LREC.

[26]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[27]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[28]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[29]  Thierry Etchegoyhen,et al.  Weighted Set-Theoretic Alignment of Comparable Sentences , 2017, BUCC@ACL.

[30]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[31]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[32]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[33]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[34]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[35]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[36]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[37]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.