论文信息 - Embedding Learning Through Multilingual Concept Induction - 字舞流文

Embedding Learning Through Multilingual Concept Induction

We present a new method for estimating vector space representations of words: embedding learning by concept induction. We test this method on a highly parallel corpus and learn semantic representations of words in 1259 different languages in a single common space. An extensive experimental evaluation on crosslingual word similarity and sentiment analysis indicates that concept-based multilingual embedding learning performs better than previous approaches.

Alexander M. Fraser | Hinrich Schütze | Alexander Fraser | Mengjie Zhao | Martin Schmitt | Philipp Dufter | Mengjie Zhao | Hinrich Schütze | Martin Schmitt | Philipp Dufter

[1] I. Dan Melamed. A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[2] Michel Simard. Text-Translation Alignment: Three Languages Are Better Than Two , 1999, EMNLP.

[3] Jörg Tiedemann,et al. Combining Clues for Word Alignment , 2003, EACL.

[4] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5] Harold L. Somers,et al. Round-trip Translation: What Is It Good For? , 2005, ALTA.

[6] Yves Lepage,et al. The contribution of the notion of hapax legomena to word alignment , 2007 .

[7] Philip Resnik,et al. Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[8] Hinrich Schütze,et al. Word Alignment by Thresholded Two-Dimensional Normalization , 2009, MTSUMMIT.

[9] The Efficacy of Round-trip Translation for MT Evaluation , 2010 .

[10] Wiebke Wagner,et al. Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[11] Slav Petrov,et al. Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[12] Marie-Francine Moens,et al. Sub-corpora Sampling with an Application to Bilingual Lexicon Extraction , 2012, COLING.

[13] Marie-Francine Moens,et al. Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge , 2012, EACL.

[14] Ivan Titov,et al. Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[15] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[16] Christopher D. Manning,et al. Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[17] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[18] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[19] Phil Blunsom,et al. Multilingual Distributed Representations without Word Alignment , 2013, ICLR 2014.

[20] Phil Blunsom,et al. Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[21] Min Xiao,et al. Distributed Word Representation Learning for Cross-Lingual Dependency Parsing , 2014, CoNLL.

[22] Hugo Larochelle,et al. An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[23] Robert Östling,et al. Bayesian Word Alignment for Massively Parallel Texts , 2014, EACL.

[24] Manaal Faruqui,et al. Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[25] Thomas Mayer,et al. Creating a massively parallel Bible corpus , 2014, LREC.

[26] Yulia Tsvetkov,et al. Metaphor Detection with Cross-Lingual Model Transfer , 2014, ACL.

[27] Barbara Plank,et al. Inverted indexing for cross-lingual NLP , 2015, ACL.

[28] Marie-Francine Moens,et al. Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction , 2015, ACL.

[29] Christopher D. Manning,et al. Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[30] Yoshua Bengio,et al. BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[31] Oriol Vinyals,et al. Multilingual Language Processing From Bytes , 2015, NAACL.

[32] Barbara Plank,et al. Multilingual Projection for Parsing Truly Low-Resource Languages , 2016, TACL.

[33] Guillaume Lample,et al. Massively Multilingual Word Embeddings , 2016, ArXiv.

[34] M. Utiyama,et al. A Novel Bilingual Word Embedding Method for Lexical Translation Using Bilingual Sense Clique , 2016, ArXiv.

[35] Jörg Tiedemann,et al. Efficient Word Alignment with Markov Chain Monte Carlo , 2016, Prague Bull. Math. Linguistics.

[36] David Yarowsky,et al. A Representation Learning Framework for Multi-Source Transfer Parsing , 2016, AAAI.

[37] Manaal Faruqui,et al. Cross-lingual Models of Word Embeddings: An Empirical Comparison , 2016, ACL.

[38] Omer Levy,et al. A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments , 2016, EACL.

[39] Jörg Tiedemann,et al. Continuous multilinguality with language vectors , 2016, EACL.

[40] Hiroshi Kanayama,et al. Multilingual Training of Crosslingual Word Embeddings , 2017, EACL.

[41] Graham Neubig,et al. Learning Language Representations for Typology Prediction , 2017, EMNLP.

[42] Hinrich Schütze,et al. Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages , 2017, EMNLP.

[43] Marie-Francine Moens,et al. Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations , 2017, EACL.

[44] Jörg Tiedemann,et al. Emerging Language Spaces Learned From Massively Multilingual Corpora , 2018, DHN.