A state-of-the-art of semantic change computation

This paper reviews the state-of-the-art of semantic change computation, one emerging research field in computational linguistics, proposing a framework that summarizes the literature by identifying and expounding five essential components in the field: diachronic corpus, diachronic word sense characterization, change modelling, evaluation data and data visualization. Despite the potential of the field, the review shows that current studies are mainly focused on testifying hypotheses proposed in theoretical linguistics and that several core issues remain to be solved: the need for diachronic corpora of languages other than English, the need for comprehensive evaluation data for evaluation, the comparison and construction of approaches to diachronic word sense characterization and change modelling, and further exploration of data visualization techniques for hypothesis justification.

[1]  Dirk Geeraerts,et al.  Diachronic Prototype Semantics: A Contribution to Historical Lexicology , 1997 .

[2]  Roswitha Fischer,et al.  Lexical change in present-day English: A corpus-based study of the motivation, institutionalization, and productivity of creative neologisms , 1998 .

[3]  Grover Hudson,et al.  THE HANDBOOK OF HISTORICAL LINGUISTICS , 2005 .

[4]  A. Kroch Reflexes of grammar in patterns of language change , 1989, Language Variation and Change.

[5]  David M. Blei,et al.  Dynamic Embeddings for Language Evolution , 2018, WWW.

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Eneko Agirre,et al.  UBC-AS: A Graph Based Unsupervised System for Induction and Classification , 2007, SemEval@ACL.

[8]  Mark Hale,et al.  Historical Linguistics: Theory and Method , 2007 .

[9]  Martin Wynne,et al.  Developing Linguistic Corpora: a Guide to Good Practice , 2005 .

[10]  Nathalie Prevost,et al.  The Physics of Language: Toward a Phase Transition of Language Change , 2008 .

[11]  Charles James Nice Bailey Variation and linguistic theory , 1973 .

[12]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[13]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[14]  Timothy Baldwin,et al.  LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning , 2016, ACL.

[15]  Jean Véronis,et al.  HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[16]  R. Darnell Translation , 1873, The Indian medical gazette.

[17]  Àngels Massip-Bonet,et al.  Language as a Complex Adaptive System: Towards an Integrative Linguistics , 2013 .

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  David Crystal,et al.  Language and the Internet , 2001 .

[21]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[22]  Peter Koch Introduction: Historical semantics and cognition , 2017 .

[23]  David Lewis Convention: A Philosophical Study , 1986 .

[24]  Daphna Weinshall,et al.  Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models , 2017, EMNLP.

[25]  Kris Heylen,et al.  Monitoring Polysemy: Word Space Models as a Tool for Large-Scale Lexical Semantic Analysis , 2015 .

[26]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[27]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[28]  Pietro Perconti,et al.  Language in Complexity: The Emerging Meaning , 2016 .

[29]  Stuart James,et al.  Dictionaries: the Art and Craft of Lexicography (2nd edition) , 2002 .

[30]  Sabine Ploux,et al.  Using Topic Salience and Connotational Drifts to Detect Candidates to Semantic Change , 2011, IWCS.

[31]  Thomas Mayer,et al.  Towards Tracking Semantic Change by Visual Analytics , 2011, ACL.

[32]  T. G.,et al.  Examination of McTaggart's Philosophy , 1934, Nature.

[33]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[34]  Timothy Baldwin,et al.  Novel Word-sense Identification , 2014, COLING.

[35]  B. Heine,et al.  Grammaticalization: A Conceptual Framework , 1991 .

[36]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[37]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[38]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[39]  Yang Xu,et al.  A Computational Evaluation of Two Laws of Semantic Change , 2015, CogSci.

[40]  Kevin Duh,et al.  A framework for analyzing semantic change of words across time , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[41]  Sidney I. Landau Dictionaries: The Art and Craft of Lexicography , 1985 .

[42]  Hui Xiong,et al.  Discovery of Evolving Semantics through Dynamic Word Embedding Learning , 2017, ArXiv.

[43]  Antony Flew,et al.  God and the Soul , 1970 .

[44]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[45]  Mohammad Nasiruddin,et al.  A State of the Art of Word Sense Induction: A Way Towards Word Sense Disambiguation for Under-Resourced Languages (État de l’art de l’induction de sens: une voie vers la désambiguïsation lexicale pour les langues peu dotées) [in French] , 2013, JEP/TALN/RECITAL.

[46]  Xiaohe Chen,et al.  Semantic change computation: A successive approach , 2013, World Wide Web.

[47]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[48]  Yoav Goldberg,et al.  A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books , 2013, *SEMEVAL.

[49]  Eyal Sagi,et al.  Tracing semantic change with latent semantic analysis , 2011 .

[50]  Roberto Navigli,et al.  Two Knowledge-based Methods for High-Performance Sense Distribution Learning , 2018, AAAI.

[51]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[52]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[53]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[54]  Eveline Wandl-Vogt,et al.  A spatio-temporal visual analysis tool for historical dictionaries , 2016, TEEM.

[55]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[56]  Kie Zuraw Language Change: Probabilistic Models , 2006 .

[57]  Eyal Sagi,et al.  Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space , 2009 .

[58]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[59]  Katrin Erk,et al.  Unknown word sense detection as outlier detection , 2006, NAACL.

[60]  Dirk Geeraerts,et al.  Diachronic prototype semantics. A digest , 1999 .

[61]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[62]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[63]  Peter Geach Truth, love, and immortality: An introduction to McTaggart's philosophy , 1979 .

[64]  Henning Andersen,et al.  Understanding linguistic innovations , 1989 .

[65]  Richard B. Dasher,et al.  Regularity in Semantic Change: Index of languages , 2001 .

[66]  Karin Cavallin Automatic extraction of potential examples of semantic change using lexical sets , 2012, KONVENS.

[67]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[68]  Martin Hilpert,et al.  Meaning change in a petri dish: constructions, semantic vector spaces, and motion charts , 2015 .

[69]  J. Donnelly,et al.  God and the Soul , 1971 .

[70]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[71]  Benjamin W. Fortson,et al.  An Approach to Semantic Change , 2008 .

[72]  Eve Sweetser From Etymology to Pragmatics: Subject index , 1990 .

[73]  W. Labov Principles of Linguistic Change: Internal Factors , 1994 .

[74]  Anne H. Anderson,et al.  Encyclopedia of language & linguistics , 2006 .

[75]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[76]  John A Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.

[77]  W. Bruce Croft,et al.  Language Is a Complex Adaptive System: Position Paper , 2009 .

[78]  David M. Blei,et al.  Dynamic Bernoulli Embeddings for Language Evolution , 2017, ArXiv.

[79]  Yair Neuman,et al.  An information-based procedure for measuring semantic change in historical data , 2017 .

[80]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[81]  Tony McEnery,et al.  English Language: Description, Variation and Context , 2009 .

[82]  Stefan Th. Gries,et al.  Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition , 2009, Lit. Linguistic Comput..

[83]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[84]  Ioannis Korkontzelos,et al.  UoY: Graphs of Unambiguous Vertices for Word Sense Induction and Disambiguation , 2010, SemEval@ACL.

[85]  Yulia Tsvetkov,et al.  A bottom up approach to category mapping and meaning change , 2015, NetWordS.

[86]  Timothy Baldwin,et al.  Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models , 2014, ACL.

[87]  Aimée Camus Quelques Graminées nouvelles pour la flore de l'Indo-Chine , 1928 .

[88]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[89]  Xu Chen,et al.  Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding , 2017, ACL.

[90]  Eve Sweetser,et al.  From Etymology to Pragmatics: Preface , 1990 .

[91]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[92]  Ernst Håkon Jahr,et al.  Language Change: Contributions to the Study of its Causes , 1989 .

[93]  R. Kirk CONVENTION: A PHILOSOPHICAL STUDY , 1970 .

[94]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[95]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[96]  Eve Sweetser From Etymology to Pragmatics: List of abbreviations , 1990 .