Ad Hoc Monitoring of Vocabulary Shifts over Time

Word meanings change over time. Detecting shifts in meaning for particular words has been the focus of much research recently. We address the complementary problem of monitoring shifts in vocabulary over time. That is, given a small seed set of words, we are interested in monitoring which terms are used over time to refer to the underlying concept denoted by the seed words. In this paper, we propose an algorithm for monitoring shifts in vocabulary over time, given a small set of seed terms. We use distributional semantic methods to infer a series of semantic spaces over time from a large body of time-stamped unstructured textual documents. We construct semantic networks of terms based on their representation in the semantic spaces and use graph-based measures to calculate saliency of terms. Based on the graph-based measures we produce ranked lists of terms that represent the concept underlying the initial seed terms over time as final output. As the task of monitoring shifting vocabularies over time for an ad hoc set of seed words is, to the best of our knowledge, a new one, we construct our own evaluation set. Our main contributions are the introduction of the task of ad hoc monitoring of vocabulary shifts over time, the description of an algorithm for tracking shifting vocabularies over time given a small set of seed words, and a systematic evaluation of results over a substantial period of time (over four decades). Additionally, we make our newly constructed evaluation set publicly available.

[1]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[2]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[3]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[4]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[5]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[6]  Jouni-Matti Kuukkanen MAKING SENSE OF CONCEPTUAL CHANGE , 2008 .

[7]  M. de Rijke,et al.  The University of Amsterdam at TREC 2008: Blog, Enterprise, and Relevance Feedback , 2008 .

[8]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[9]  Gerhard Heyer,et al.  Change of Topics over Time - Tracking Topics by their Change of Meaning , 2009, KDIR.

[10]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.

[11]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[12]  Sven Teresniak,et al.  Towards Automatic Detection and Tracking of Topic Change , 2010, CICLing.

[13]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[14]  Michel C. A. Klein,et al.  Concept drift and how to identify it , 2011, J. Web Semant..

[15]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[16]  Jo Guldi The History of Walking and the Digital Turn: Stride and Lounge in London, 1808–1851 , 2012, The Journal of Modern History.

[17]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  M. de Rijke,et al.  A Digital Humanities Approach to the History of Science - Eugenics Revisited in Hidden Debates by Means of Semantic Text Mining , 2013, SocInfo Workshops.

[20]  Tom Kenter Filtering Documents over Time on Evolving Topics - The University of Amsterdam at TREC 2013 KBA CCR , 2013, TREC.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Ellen M. Voorhees,et al.  Evaluating Stream Filtering for Entity Profile Updates for TREC 2013 , 2013, TREC.

[23]  L. Buckland UvA-DARE (Digital Academic Repository) The University of Amsterdam at TREC 2012 , 2013 .

[24]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[25]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  A. Betti,et al.  Modelling the History of Ideas , 2014 .

[28]  M. de Rijke,et al.  Evaluating document filtering systems over time , 2015, Inf. Process. Manag..

[29]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.