Semantic Drift in Multilingual Representations

Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations that are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families. We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: They can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages.

[1]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[2]  Lynne Pearce Cultural shift. , 2008, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[3]  Matthijs Douze,et al.  Learning Joint Multilingual Sentence Representations with Neural Machine Translation , 2017, Rep4NLP@ACL.

[4]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[9]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[10]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[11]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[12]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[13]  Trevor Cohn,et al.  Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data , 2015, CoNLL.

[14]  Ian Maddieson,et al.  On the universal structure of human lexical semantics , 2015, Proceedings of the National Academy of Sciences.

[15]  Benjamin Van Durme,et al.  Learning Bilingual Lexicons Using the Visual Similarity of Labeled Web Images , 2011, IJCAI.

[16]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[17]  Lisa Beinborn,et al.  Predicting and Manipulating the Difficulty of Text-Completion Exercises for Language Learning , 2016 .

[18]  Johann-Mattis List,et al.  Beautiful Trees on Unstable Ground: Notes on the Data Problem in Lexicostatistics , 2010 .

[19]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[20]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[21]  Omer Levy,et al.  A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments , 2016, EACL.

[22]  Meng Zhang,et al.  Adversarial Training for Unsupervised Bilingual Lexicon Induction , 2017, ACL.

[23]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[24]  Iryna Gurevych,et al.  Cognate Production using Character-based Machine Translation , 2013, IJCNLP.

[25]  Thomas T. Hills,et al.  The Macroscope: A tool for examining the historical structure of language , 2019, Behavior Research Methods.

[26]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[27]  L. Borin,et al.  C L ] 1 5 N ov 2 01 8 Survey of Computational Approaches to Diachronic Conceptual Change Detection , 2018 .

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[30]  Noah A. Smith,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016, ACL 2016.

[31]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[32]  Iryna Gurevych,et al.  Multimodal Grounding for Language Processing , 2018, COLING.

[33]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[34]  Hiroshi Kanayama,et al.  Multilingual Training of Crosslingual Word Embeddings , 2017, EACL.

[35]  Shuly Wintner,et al.  Found in Translation: Reconstructing Phylogenetic Language Trees from Translations , 2017, ACL.

[36]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[37]  Georgiana Dinu,et al.  Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning , 2015, ACL.

[38]  M. Serva,et al.  Indo-European languages tree by Levenshtein distance , 2007, 0708.2971.

[39]  Ehsaneddin Asgari,et al.  Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance , 2016, NAACL 2016.

[40]  Zi-Yi Dou,et al.  Unsupervised Bilingual Lexicon Induction via Latent Variable Models , 2018, EMNLP.

[41]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[42]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[43]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[44]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[45]  Anders Søgaard,et al.  Simple task-specific bilingual word embeddings , 2015, NAACL.

[46]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[47]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[48]  Michael Cysouw,et al.  Chapter Predicting language-learning difficulty , 2013 .

[49]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[50]  Katrin Erk,et al.  Deep Neural Models of Semantic Shift , 2018, NAACL-HLT.

[51]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[52]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[53]  Iryna Gurevych,et al.  OntoWiktionary – Constructing an Ontology from the Collaborative Online Dictionary Wiktionary , 2012 .

[54]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Anna Korhonen,et al.  On the Role of Seed Lexicons in Learning Bilingual Word Embeddings , 2016, ACL.

[56]  Armin Hoenen,et al.  Language classification from bilingual word embedding graphs , 2016, COLING.

[57]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[58]  Nancy Kanwisher,et al.  Toward a universal decoder of linguistic meaning from brain activation , 2018, Nature Communications.

[59]  Lars Borin,et al.  Survey of Computational Approaches to Diachronic Conceptual Change , 2018, ArXiv.

[60]  Willem Zuidema,et al.  Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains , 2019, BlackboxNLP@ACL.

[61]  C. España-Bonet,et al.  Multilingual Semantic Networks for Data-driven Interlingua Seq 2 Seq Systems , .

[62]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[63]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[64]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[65]  Grzegorz Kondrak,et al.  Bootstrapping Unsupervised Bilingual Lexicon Induction , 2017, EACL.

[66]  Tandy Warnow,et al.  Indo‐European and Computational Cladistics , 2002 .

[67]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[68]  Anders Søgaard,et al.  Evaluating word embeddings with fMRI and eye-tracking , 2016, RepEval@ACL.

[69]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[70]  R. Yangarber,et al.  From alignment of etymological data to phylogenetic inference via population genetics , 2016 .

[71]  Sebastian Ruder,et al.  A survey of cross-lingual embedding models , 2017, ArXiv.

[72]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[73]  Holger Schwenk,et al.  Filtering and Mining Parallel Data in a Joint Multilingual Space , 2018, ACL.

[74]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[75]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[76]  Marie-Francine Moens,et al.  Multi-Modal Representations for Improved Bilingual Lexicon Learning , 2016, ACL.

[77]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[78]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[79]  Jörg Tiedemann,et al.  What Do Language Representations Really Represent? , 2019, Computational Linguistics.

[80]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[81]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[82]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[83]  R. H. Richens Interlingual Machine Translation , 1958, Comput. J..

[84]  Gary Lupyan,et al.  Quantifying Semantic Similarity Across Languages , 2018, CogSci.

[85]  M. Swadesh Towards Greater Accuracy in Lexicostatistic Dating , 1955, International Journal of American Linguistics.

[86]  Alexander Borst,et al.  How does Nature Program Neuron Types? , 2008, Front. Neurosci..

[87]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[88]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.