Temporal Methods to Detect Content-Based Anomalies in Social Media

We develop a method for time-dependent topic tracking and meme trending in social media. Our objective is to identify time periods whose content differs significantly from normal, and we utilize two techniques to do so. The first is an information-theoretic analysis of the distributions of terms emitted during different periods of time. In the second, we cluster documents from each time period and analyze the tightness of each clustering. We also discuss a method of combining the scores created by each technique, and we provide ample empirical analysis of our methodology on various Twitter datasets.

[1]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[2]  K. M. Simonson Probabilistic fusion of ATR results , 1998 .

[3]  Inderjit S. Dhillon,et al.  Generative model-based clustering of directional data , 2003, KDD '03.

[4]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[5]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[8]  Jennifer Golbeck,et al.  Predicting personality with social media , 2011, CHI Extended Abstracts.

[9]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[13]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[14]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[15]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[16]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[17]  Jacek Skryzalin,et al.  Temporal Anomaly Detection in Social Media , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[18]  Xin Zhang,et al.  Better Burst Detection , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[22]  D. Stott Parker,et al.  Topic dynamics: an alternative model of bursts in streams of topics , 2010, KDD.

[23]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[24]  Suvrit Sra,et al.  A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x) , 2012, Comput. Stat..

[25]  James Allan,et al.  Extracting significant time varying features from text , 1999, CIKM '99.

[26]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[27]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.