A Method of Semantic Change Detection Using Diachronic Corpora Data

The article proposes a method for detecting semantic change using diachronic corpora data. The method is based on the distributional hypothesis. The analysis is performed using frequencies of syntactic bigrams from the English and Russian sub-corpora of Google Books Ngram. To obtain the word co-occurrence profile in its new meaning, syntactic bigrams that contributed most to the word distribution change are selected and their time series are clustered. The method is tested on a group of English and Russian words which gained new meanings in the 20th century. The obtained results show that the proposed method allows one to detect semantics changes, as well as to determine the time of these changes.

[1]  Marcelo A. Montemurro,et al.  Coherent oscillations in word-use data from 1700 to 2008 , 2016, Palgrave Communications.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[4]  Efstathios Stamatatos,et al.  Syntactic Dependency-Based N-grams as Classification Features , 2012, MICAI.

[5]  Lars Borin,et al.  Survey of Computational Approaches to Diachronic Conceptual Change , 2018, ArXiv.

[6]  Patrick Juola,et al.  The Time Course of Language Change , 2003, Comput. Humanit..

[7]  Olga Lyashevskaya,et al.  Grammatical profiles and the interaction of the lexicon with aspect, tense, and mood in Russian , 2011 .

[8]  Iosif Ilitch Gikhman,et al.  Introduction to the theory of random processes , 1969 .

[9]  S. Gries,et al.  Behavioral profiles: A corpus-based approach to cognitive semantic analysis , 2009 .

[10]  Sergio Sánchez,et al.  Rank Diversity of Languages: Generic Behavior in Computational Linguistics , 2015, PloS one.

[11]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[12]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[13]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[14]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[15]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[16]  Valery D. Solovyev,et al.  What constructional profiles reveal about synonymy: A case study of Russian words for sadness and happiness , 2009 .

[17]  V V Bochkarev,et al.  Semantic similarity and analysis of the word frequency dynamics , 2017 .

[18]  V Bochkarev,et al.  Universals versus historical contingencies in lexical evolution , 2014, Journal of The Royal Society Interface.

[19]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[20]  Christian Biemann,et al.  That’s sick dude!: Automatic identification of word sense change across different timescales , 2014, ACL.

[21]  Stefan Th. Gries,et al.  Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition , 2009, Lit. Linguistic Comput..

[22]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[23]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.