Wasserstein distances for evaluating cross-lingual embeddings

Word embeddings are high dimensional vector representations of words that capture their semantic similarity in the vector space. There exist several algorithms for learning such embeddings both for a single language as well as for several languages jointly. In this work we propose to evaluate collections of embeddings by adapting downstream natural language tasks to the optimal transport framework. We show how the family of Wasserstein distances can be used to solve cross-lingual document retrieval and the cross-lingual document classification problems. We argue on the advantages of this approach compared to more traditional evaluation methods of embeddings like bilingual lexical induction. Our experimental results suggest that using Wasserstein distances on these problems out-performs several strong baselines and performs on par with state-of-the-art models.

[1]  Zeljko Agic,et al.  JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages , 2019, ACL.

[2]  Goran Glavas,et al.  Evaluating Resource-Lean Cross-Lingual Embedding Models in Unsupervised Retrieval , 2019, SIGIR.

[3]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[4]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[5]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[6]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[7]  Holger Schwenk,et al.  A Corpus for Multilingual Document Classification in Eight Languages , 2018, LREC.

[8]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[9]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[12]  Georgios Balikas,et al.  Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance , 2018, ECIR.

[13]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[14]  Robyn Speer,et al.  ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge , 2017, *SEMEVAL.

[15]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[16]  Goran Glavas,et al.  How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[17]  L. Kantorovich On the Translocation of Masses , 2006 .

[18]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[19]  R. Speer,et al.  An Ensemble Method to Produce High-Quality Word Embeddings , 2016, ArXiv.

[20]  Gang Hua,et al.  Order-Preserving Wasserstein Distance for Sequence Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[22]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[23]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.