Analysing Lexical Semantic Change with Contextualised Word Representations

This paper presents the first unsupervised approach to lexical semantic change that makes use of contextualised word representations. We propose a novel method that exploits the BERT neural language model to obtain representations of word usages, clusters these representations into usage types, and measures change along time with three proposed metrics. We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements. Our extensive qualitative analysis demonstrates that our method captures a variety of synchronic and diachronic linguistic phenomena. We expect our work to inspire further research in this direction.

[1]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[2]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[3]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[4]  Katrin Erk,et al.  Investigations on Word Senses and Word Usages , 2009, ACL.

[5]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[6]  Humayun Rashid,et al.  Language of Change , 2008, AMCIS.

[7]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[8]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[9]  Roberto Navigli,et al.  SemEval-2013 Task 11: Word Sense Induction and Disambiguation within an End-User Application , 2013, SemEval@NAACL-HLT.

[10]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[11]  Katrin Erk,et al.  Deep Neural Models of Semantic Shift , 2018, NAACL-HLT.

[12]  David M. Blei,et al.  Dynamic Embeddings for Language Evolution , 2018, WWW.

[13]  P. Ludlow,et al.  Living Words: Meaning Underdetermination and the Dynamic Lexicon , 2014 .

[14]  Christian Biemann,et al.  That’s sick dude!: Automatic identification of word sense change across different timescales , 2014, ACL.

[15]  Susan Windisch Brown,et al.  Choosing Sense Distinctions for WSD: Psycholinguistic Evidence , 2008, ACL.

[16]  Claudia Brugman The Story of over : polysemy, semantics, and the structure of the lexicon , 1988 .

[17]  Timothy Baldwin,et al.  Novel Word-sense Identification , 2014, COLING.

[18]  Mark Davies Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English , 2012 .

[19]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[20]  Andrey Kutuzov,et al.  UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection , 2020, SEMEVAL.

[21]  Yang Xu,et al.  A Computational Evaluation of Two Laws of Semantic Change , 2015, CogSci.

[22]  Yulia Tsvetkov,et al.  A bottom up approach to category mapping and meaning change , 2015, NetWordS.

[23]  Timothy Baldwin,et al.  Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models , 2014, ACL.

[24]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[25]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[26]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[27]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[28]  José Camacho-Collados,et al.  WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[29]  Xiaohe Chen,et al.  Semantic change computation: A successive approach , 2013, World Wide Web.

[30]  Paul J. Hopper,et al.  On some principles of grammaticization , 1991 .

[31]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[32]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[33]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Dominik Schlechtweg,et al.  Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change , 2018, NAACL.

[35]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[36]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[37]  Gemma Boleda,et al.  Short-Term Meaning Shift: A Distributional Exploration , 2018, NAACL.

[38]  Shen Li,et al.  Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View , 2019, ACL.

[39]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[40]  Carita Paradis,et al.  Metonymization as a key mechanism in semantic change , 2008 .

[41]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[42]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[43]  M. Giulianelli Lexical Semantic Change Analysis with Contextualised Word Representations , 2019 .

[44]  Stephan Mandt,et al.  Dynamic Word Embeddings , 2017, ICML.

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[46]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[47]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[48]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[49]  Elaine Zosa,et al.  Capturing Evolution in Word Usage: Just Add More Clusters? , 2020, WWW.

[50]  Katrin Erk,et al.  Exemplar-Based Models for Word Meaning in Context , 2010, ACL.

[51]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[52]  Katrin Erk,et al.  Measuring Word Meaning in Context , 2013, CL.

[53]  Gregor Wiedemann,et al.  Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.

[54]  Rajeev K. Azad,et al.  Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis , 2014, PloS one.

[55]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[56]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[57]  Xuri Tang,et al.  A state-of-the-art of semantic change computation , 2018, Natural Language Engineering.

[58]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.