Measuring, Predicting and Visualizing Short-Term Change in Word Representation and Usage in VKontakte Social Network

Language in social media is extremely dynamic: new words emerge, trend and disappear, while the meaning of existing words can fluctuate over time. Such dynamics are especially notable during a period of crisis. This work addresses several important tasks of measuring, visualizing and predicting short term text representation shift, i.e. the change in a word's contextual semantics, and contrasting such shift with surface level word dynamics, or concept drift, observed in social media streams. Unlike previous approaches on learning word representations from text, we study the relationship between short-term concept drift and representation shift on a large social media corpus - VKontakte posts in Russian collected during the Russia-Ukraine crisis in 2014-2015. Our novel contributions include quantitative and qualitative approaches to (1) measure short-term representation shift and contrast it with surface level concept drift; (2) build predictive models to forecast short-term shifts in meaning from previous meaning as well as from concept drift; and (3) visualize short-term representation shift for example keywords to demonstrate the practical use of our approach to discover and track meaning of newly emerging terms in social media. We show that short-term representation shift can be accurately predicted up to several weeks in advance. Our unique approach to modeling and visualizing word representation shifts in social media can be used to explore and characterize specific aspects of the streaming corpus during crisis events and potentially improve other downstream classification tasks including real-time event detection.

[1]  Anthony Stefanidis,et al.  #Earthquake: Twitter as a Distributed Sensor System , 2013, Trans. GIS.

[2]  Omer F. Rana,et al.  Sensing Real-World Events Using Arabic Twitter Posts , 2016, ICWSM.

[3]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[4]  Filippo Menczer,et al.  Virality Prediction and Community Structure in Social Networks , 2013, Scientific Reports.

[5]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[6]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[7]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[8]  Anna Gladkova,et al.  Intrinsic Evaluations of Word Embeddings: What Can We Do Better? , 2016, RepEval@ACL.

[9]  S. Volkova,et al.  Account Deletion Prediction on RuNet: A Case Study of Suspicious Twitter Accounts Active During the Russian-Ukrainian Crisis , 2016 .

[10]  Bernardete Ribeiro,et al.  Concept Drift Awareness in Twitter Streams , 2014, 2014 13th International Conference on Machine Learning and Applications.

[11]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[12]  W. Cleveland LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression , 1981 .

[13]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[14]  Eyal Sagi,et al.  Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space , 2009 .

[15]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[16]  Carlos J. Martín-Dancausa,et al.  Spot the Ball: Detecting Sports Events on Twitter , 2014, ECIR.

[17]  Thomas Kolarik,et al.  Time series forecasting using neural networks , 1994, APL '94.

[18]  Alexander G. Nikolaev,et al.  Do social networks bridge political divides? The analysis of VKontakte social network communication in Ukraine , 2015 .

[19]  Eric P. Xing,et al.  Diffusion of Lexical Change in Social Media , 2012, PloS one.

[20]  Trevor Cohn,et al.  A user-centric model of voting intention from Social Media , 2013, ACL.

[21]  Dustin Arendt,et al.  Contrasting Public Opinion Dynamics and Emotional Response During Crisis , 2016, SocInfo.

[22]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  A. Bifet,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[25]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[26]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[30]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[31]  M. de Rijke,et al.  Ad Hoc Monitoring of Vocabulary Shifts over Time , 2015, CIKM.

[32]  John Liu,et al.  sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings , 2015, ArXiv.

[33]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[34]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[35]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[36]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[37]  Walid Magdy,et al.  Adaptive Method for Following Dynamic Topics on Twitter , 2014, ICWSM.