A Methodological Framework for Statistical Analysis of Social Text Streams

Social media are one of the main contributors of user generated content; providing vast amounts of data in daily basis, covering a wide range of topics, interests and events. In order to identify and link meaningful and relevant information, clustering algorithms have been used to partition the user generated content. We have identified though that these algorithms exhibit various shortcomings when they have to deal with social media textual information, which is dynamic and streaming in nature. Thus we explore the idea to estimate the algorithms’ parameters based on observations on the clusters’ properties’ (like the centroid, shape and density) evolution. By experimenting with the clusters’ properties, we propose a methodological framework that detects the evolution of the clusters’ centroid, shape and density and explores their role in parameters’ estimation.

[1]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[2]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[3]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[4]  Hans-Peter Kriegel,et al.  Discovering global and local bursts in a stream of news , 2012, SAC '12.

[5]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[6]  van Gerardus Noord,et al.  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) , 2010 .

[7]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[8]  Kalina Bontcheva,et al.  Making sense of social media streams through semantics: A survey , 2014, Semantic Web.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[11]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[12]  Luc Devroye,et al.  Sample-based non-uniform random variate generation , 1986, WSC '86.

[13]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[14]  Haibo He,et al.  Advances in Neural Networks - ISNN 2011 - 8th International Symposium on Neural Networks, ISNN 2011, Guilin, China, May 29-June 1, 2011, Proceedings, Part I , 2011, International Symposium on Neural Networks.

[15]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[16]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[17]  Chung-Hong Lee,et al.  BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages , 2011, ISNN.

[18]  Qi He,et al.  Bursty Feature Representation for Clustering Text Streams , 2007, SDM.

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[21]  Ling Chen,et al.  Event detection from flickr data through wavelet-based spatial analysis , 2009, CIKM.

[22]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[23]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[24]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[25]  Mor Naaman,et al.  Is it really about me?: message content in social awareness streams , 2010, CSCW '10.

[26]  Prasenjit Mitra,et al.  Temporal and Information Flow Based Event Detection from Social Text Streams , 2007, AAAI.