An Autoencoder Approach to Learning Bilingual Word Representations

Cross-language learning allows one to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. We empirically investigate the success of our approach on the problem of cross-language text classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). In experiments on 3 language pairs, we show that our approach achieves state-of-the-art performance, outperforming a method exploiting word alignments and a strong machine translation baseline.

[1]  Michael L. Littman,et al.  Automatic Cross-Language Retrieval Using Latent Semantic Indexing , 1997 .

[2]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[3]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[4]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[5]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[6]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[9]  Mirella Lapata,et al.  Cross-lingual Annotation Projection for Semantic Roles , 2009, J. Artif. Intell. Res..

[10]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[11]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[12]  John C. Platt,et al.  Translingual Document Representations from Discriminative Projections , 2010, EMNLP.

[13]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[14]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[15]  Yoshua Bengio,et al.  Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.

[16]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[17]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[18]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[19]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[20]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[21]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[22]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[23]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[24]  Phil Blunsom,et al.  Multilingual Distributed Representations without Word Alignment , 2013, ICLR 2014.

[25]  Jianfeng Gao,et al.  Learning Semantic Representations for the Phrase Translation Model , 2013, ArXiv.

[26]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[27]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[28]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[29]  Peter Kulchyski and , 2015 .

[30]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .