A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains

We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on a common state-of-the-art evaluation task. In addition, we demonstrate that the same evaluation task and modelling approaches can successfully be utilised for the synchronic detection of domain-specific sense divergences in the field of term extraction.

[1]  B. Hawkins,et al.  A framework: , 2020, Harmful Interaction between the Living and the Dead in Greek Tragedy.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[4]  Malvina Nissim,et al.  Tracing Metaphors in Time through Self-Distance in Vector Spaces , 2016, CLiC-it/EVALITA.

[5]  Dominik Schlechtweg,et al.  DISTRIBUTION-BASED PREDICTION OF THE DEGREE OF GRAMMATICALIZATION FOR GERMAN PREPOSITIONS , 2018 .

[6]  Timothy Baldwin,et al.  Novel Word-sense Identification , 2014, COLING.

[7]  Dominik Schlechtweg,et al.  German in Flux: Detecting Metaphoric Change via Word Entropy , 2017, CoNLL.

[8]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[9]  Hwee Tou Ng,et al.  Semi-Supervised Word Sense Disambiguation Using Word Embeddings in General and Specific Domains , 2015, NAACL.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  María José Marín Pérez Measuring the degree of specialisation of sub-technical legal terms through corpus comparison: A domain-independent method , 2016 .

[12]  Rada Mihalcea,et al.  Word Epoch Disambiguation: Finding How Words Change Over Time , 2012, ACL.

[13]  Udo Hahn,et al.  Bad Company—Neighborhoods in Neural Embedding Spaces Considered Harmful , 2016, COLING.

[14]  Eneko Agirre,et al.  Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations , 2018, AAAI.

[15]  Sophia Ananiadou,et al.  Term sense disambiguation using a domain-specific thesaurus , 1998, LREC.

[16]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[17]  Stefania Gnesi,et al.  Detecting Domain-Specific Ambiguities: An NLP Approach Based on Wikipedia Crawling and Word Embeddings , 2017, 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW).

[18]  Sabine Schulte im Walde,et al.  A Laypeople Study on Terminology Identification across Domains and Task Definitions , 2018, NAACL-HLT.

[19]  Katrin Erk,et al.  Deep Neural Models of Semantic Shift , 2018, NAACL-HLT.

[20]  Raquel Fernández,et al.  Semantic Variation in Online Communities of Practice , 2018, IWCS.

[21]  David Sanchez,et al.  Dialectometric analysis of language variation in Twitter , 2017, VarDial.

[22]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[23]  Dirk Hovy,et al.  Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting , 2018, EMNLP.

[24]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[25]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[26]  Ping Chen,et al.  Context-based Term Disambiguation in Biomedical Literature , 2006, FLAIRS Conference.

[27]  Kevin Duh,et al.  A framework for analyzing semantic change of words across time , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[28]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.

[29]  Eyal Sagi,et al.  Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space , 2009 .

[30]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[31]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[32]  Yannick Toussaint,et al.  Ambiguity Diagnosis for Terms in Digital Humanities , 2016, LREC.

[33]  David Bamman,et al.  Measuring historical word sense variation , 2011, JCDL '11.

[34]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[35]  Sebastian Padó,et al.  Predicting the Direction of Derivation in English Conversion , 2016, SIGMORPHON.

[36]  Patrick Drouin,et al.  Detection of Domain Specific Terminology Using Corpora Comparison , 2004, LREC.

[37]  Simon Hengchen,et al.  Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change , 2019, ACL.

[38]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[39]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[40]  George Krasadakis A Framework for , 2020, The Innovation Mode.

[41]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[42]  Lars Borin,et al.  Survey of Computational Approaches to Diachronic Conceptual Change , 2018, ArXiv.

[43]  Daphna Weinshall,et al.  Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models , 2017, EMNLP.

[44]  Yue Wang,et al.  Automatic Detection of Ambiguous Terminology for Software Requirements , 2013, NLDB.

[45]  H. Paul Deutsches Wörterbuch: Bedeutungsgeschichte und Aufbau unseres Wortschatzes , 2010 .

[46]  Alexander Mehler,et al.  On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models , 2016, ACL.

[47]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[48]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[49]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[50]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[51]  Yang Xu,et al.  A Computational Evaluation of Two Laws of Semantic Change , 2015, CogSci.

[52]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[53]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[54]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[55]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[56]  Dominik Schlechtweg,et al.  SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction , 2019, *SEMEVAL.

[57]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[58]  John A Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.

[59]  Gertrud Faaß,et al.  SdeWaC - A Corpus of Parsable Sentences from the Web , 2013, GSCL.

[60]  Ioana Stanoi,et al.  Automatic Term Ambiguity Detection , 2013, ACL.

[61]  Suzanne Stevenson,et al.  Automatically Identifying Changes in the Semantic Orientation of Words , 2010, LREC.

[62]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[63]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[64]  Dominik Schlechtweg,et al.  Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change , 2018, NAACL.

[65]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[66]  Jon M. Kleinberg,et al.  Competition and Selection Among Conventions , 2017, WWW.

[67]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .