Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Type- and token-based embedding architectures are still competing in lexical semantic change detection. The recent success of type-based models in SemEval-2020 Task 1 has raised the question why the success of token-based models on a variety of other NLP tasks does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word, which is encoded even in the higher layers of BERT representations. By reducing the influence of orthography we considerably improve BERT’s performance.

[1]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[2]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[3]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[4]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[5]  Dominik Schlechtweg,et al.  OP-IMS @ DIACR-Ita: Back to the Roots: SGNS+OP+CD still Rocks Semantic Change Detection , 2020, EVALITA.

[6]  M. Cugmas,et al.  On comparing partitions , 2015 .

[7]  Dominik Schlechtweg,et al.  A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains , 2019, ACL.

[8]  Dominik Schlechtweg,et al.  CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not outperform SGNS on Semantic Change Detection , 2020, ArXiv.

[9]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[10]  Elaine Zosa,et al.  Capturing Evolution in Word Usage: Just Add More Clusters? , 2020, WWW.

[11]  Simon Hengchen,et al.  Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change , 2019, ACL.

[12]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[13]  Jakub Sido,et al.  UWB at SemEval-2020 Task 1: Lexical Semantic Change Detection , 2020, SEMEVAL.

[14]  Christin Beck DiaSense at SemEval-2020 Task 1: Modeling Sense Change via Pre-trained BERT Embeddings , 2020, SemEval@COLING.

[15]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[16]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[17]  Sorana D. Bolboacă,et al.  PEARSON VERSUS SPEARMAN, KENDALL'S TAU CORRELATION ANALYSIS ON STRUCTURE-ACTIVITY RELATIONSHIPS OF BIOLOGIC ACTIVE COMPOUNDS , 2005 .

[18]  Andrey Kutuzov,et al.  UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection , 2020, SEMEVAL.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[21]  Lars Borin,et al.  Survey of Computational Approaches to Diachronic Conceptual Change , 2018, ArXiv.

[22]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[23]  David Sanchez,et al.  Dialectometric analysis of language variation in Twitter , 2017, VarDial.

[24]  Elaine Zosa,et al.  Discovery Team at SemEval-2020 Task 1: Context-sensitive Embeddings Not Always Better than Static for Semantic Change Detection , 2020, SemEval@COLING.

[25]  Mario Giulianelli,et al.  Analysing Lexical Semantic Change with Contextualised Word Representations , 2020, ACL.

[26]  Ehsaneddin Asgari,et al.  Unsupervised Embedding-based Detection of Lexical Semantic Changes , 2020, SemEval@COLING.

[27]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[28]  Barbara McGillivray,et al.  SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection , 2020, SEMEVAL.

[29]  Simon Hengchen,et al.  Challenges for Computational Lexical Semantic Change , 2021, ArXiv.

[30]  Dominik Schlechtweg,et al.  Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change , 2018, NAACL.