SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical Semantic Change

This paper describes SChME (Semantic Change Detection with Model Ensemble), a method usedin SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature. More specifically, we combine cosine distance of wordvectors combined with a neighborhood-based metric we named Mapped Neighborhood Distance(MAP), and a word frequency differential metric as input signals to our model. Additionally,we explore alignment-based methods to investigate the importance of the landmarks used in thisprocess. Our results show evidence that the number of landmarks used for alignment has a directimpact on the predictive performance of the model. Moreover, we show that languages that sufferless semantic change tend to benefit from using a large number of landmarks, whereas languageswith more semantic change benefit from a more careful choice of landmark number for alignment.

[1]  Suzanne Stevenson,et al.  Automatically Identifying Changes in the Semantic Orientation of Words , 2010, LREC.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[4]  David M. Blei,et al.  Dynamic Embeddings for Language Evolution , 2018, WWW.

[5]  Eyal Sagi,et al.  Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space , 2009 .

[6]  Barbara McGillivray Tools for historical corpus research , and a corpus of Latin , 2015 .

[7]  Stephan Mandt,et al.  Dynamic Word Embeddings , 2017, ICML.

[8]  Dominik Schlechtweg,et al.  CCOHA: Clean Corpus of Historical American English , 2020, LREC.

[9]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[10]  Hervé Jégou,et al.  Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[11]  Dominik Schlechtweg,et al.  A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains , 2019, ACL.

[12]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[13]  Tomas Mikolov,et al.  Improving Supervised Bilingual Mapping of Word Embeddings , 2018, ArXiv.

[14]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[15]  Dominik Schlechtweg,et al.  Simulating Lexical Semantic Change from Sense-Annotated Data , 2020, ArXiv.

[16]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[17]  Tomas Mikolov,et al.  Updating Pre-trained Word Vectors and Text Classifiers using Monolingual Alignment , 2019, ArXiv.

[18]  Barbara McGillivray,et al.  SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection , 2020, SEMEVAL.

[19]  Markus Forsberg,et al.  Korp — the corpus infrastructure of Språkbanken , 2012, LREC.

[20]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.