Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change

State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing outperforms alignment models on a synthetic task as well as a manual testset. We introduce a principled way to simulate lexical semantic change and systematically control for possible biases.

[1]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[2]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[3]  Annalina Caputo,et al.  Diachronic Analysis of the Italian Language exploiting Google Ngram , 2016, CLiC-it/EVALITA.

[4]  A. Blank Prinzipien des lexikalischen Bedeutungswandels am Beispiel der romanischen Sprachen , 1997 .

[5]  Jörg Tiedemann,et al.  The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation , 2017 .

[6]  Lars Borin,et al.  Survey of Computational Approaches to Lexical Semantic Change , 2018, 1811.06278.

[7]  Simon Hengchen When Does it Mean? Detecting Semantic Change in Historical Texts , 2017 .

[8]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[9]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[10]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[13]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[14]  Roberto Navigli,et al.  Paving the Way to a Large-scale Pseudosense-annotated Dataset , 2013, HLT-NAACL.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[17]  Eyal Sagi,et al.  Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space , 2009 .

[18]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[19]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[20]  Kevin Duh,et al.  A framework for analyzing semantic change of words across time , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[21]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[22]  Dominik Schlechtweg,et al.  German in Flux: Detecting Metaphoric Change via Word Entropy , 2017, CoNLL.

[23]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[24]  Christian Biemann,et al.  That’s sick dude!: Automatic identification of word sense change across different timescales , 2014, ACL.

[25]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[26]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[27]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[28]  Stefania Gnesi,et al.  Detecting Domain-Specific Ambiguities: An NLP Approach Based on Wikipedia Crawling and Word Embeddings , 2017, 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW).

[29]  Katrin Erk,et al.  Deep Neural Models of Semantic Shift , 2018, NAACL-HLT.

[30]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[31]  David M. Blei,et al.  Dynamic Embeddings for Language Evolution , 2018, WWW.

[32]  Jure Leskovec,et al.  Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora , 2016, EMNLP.

[33]  Mark Davies The Corpus of Contemporary American English (COCA) , 2012 .

[34]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[35]  Dominik Schlechtweg,et al.  A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains , 2019, ACL.

[36]  Erik Velldal,et al.  Tracing armed conflicts with diachronic word embedding models , 2017, NEWS@ACL.

[37]  Alexander Mehler,et al.  On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models , 2016, ACL.

[38]  Jing Wang,et al.  A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment , 2015, TACL.

[39]  Thomas Risse,et al.  On the Uses of Word Sense Change for Research in the Digital Humanities , 2017, TPDL.

[40]  Jim Q. Smith,et al.  GASC: Genre-Aware Semantic Change for Ancient Greek , 2019, LChange@ACL.

[41]  Stephan Mandt,et al.  Dynamic Word Embeddings , 2017, ICML.

[42]  Daphna Weinshall,et al.  Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models , 2017, EMNLP.

[43]  Udo Hahn,et al.  Bad Company—Neighborhoods in Neural Embedding Spaces Considered Harmful , 2016, COLING.