Learning Translation Models from Monolingual Continuous Representations

Translation models often fail to generate good translations for infrequent words or phrases. Previous work attacked this problem by inducing new translation rules from monolingual data with a semi-supervised algorithm. However, this approach does not scale very well since it is very computationally expensive to generate new translation rules for only a few thousand sentences. We propose a much faster and simpler method that directly hallucinates translation rules for infrequent phrases based on phrases with similar continuous representations for which a translation is known. To speed up the retrieval of similar phrases, we investigate approximated nearest neighbor search with redundant bit vectors which we find to be three times faster and significantly more accurate than locality sensitive hashing. Our approach of learning new translation rules improves a phrase-based baseline by up to 1.6 BLEU on Arabic-English translation, it is three-orders of magnitudes faster than existing semi-supervised methods and 0.5 BLEU more accurate.

[1]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[2]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[3]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[4]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  Ming Zhou,et al.  Learning Translation Consensus with Structured Label Propagation , 2012, ACL.

[7]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[8]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[15]  Patrick Pantel,et al.  Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[16]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[17]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[18]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[19]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[20]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[21]  Jonathan Goldstein,et al.  Redundant Bit Vectors for Quickly Searching High-Dimensional Regions , 2004, Deterministic and Statistical Methods in Machine Learning.

[22]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[23]  Kristina Toutanova,et al.  Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data , 2014, ACL.

[24]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[25]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[26]  Hal Daumé,et al.  Fast Large-Scale Approximate Graph Construction for NLP , 2012, EMNLP.

[27]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[28]  Ming Zhou,et al.  Bilingually-constrained Phrase Embeddings for Machine Translation , 2014, ACL.

[29]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.