A Distributed Framework for Early Trending Topics Detection on Big Social Networks Data Threads

Social networks have become big data production engines and their analytics can reveal insightful trending topics, such that hidden knowledge can be utilized in various applications and settings. This paper addresses the problem of popular topics’ and trends’ early prediction out of social networks data streams which demand distributed software architectures. Under an online time series classification model, which is implemented in a flexible and adaptive distributed framework, trending topics are detected. Emphasis is placed on the early detection process and on the performance of the proposed framework. The implemented framework builds on the lambda architecture design and the experimentation carried out highlights the usefulness of the proposed approach in early trends detection with high rates in performance and with a validation aligned with a popular microblogging service.

[1]  Devavrat Shah,et al.  A Latent Source Model for Nonparametric Time Series Classification , 2013, NIPS.

[2]  Inder Monga,et al.  Lambda architecture for cost-effective batch and speed big data processing , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[3]  Lei Shi,et al.  Predicting US Primary Elections with Twitter , 2012 .

[4]  Carlos E. Cuesta,et al.  The Solid architecture for real-time management of big semantic data , 2015, Future Gener. Comput. Syst..

[5]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[6]  John Klein,et al.  Distribution, Data, Deployment: Software Architecture Convergence in Big Data Systems , 2015, IEEE Software.

[7]  Gagan Agrawal,et al.  Towards methods for systematic research on big data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[8]  Jianxin Li,et al.  Bursty event detection from microblog: a distributed and incremental approach , 2016, Concurr. Comput. Pract. Exp..

[9]  Arkaitz Zubiaga,et al.  Real‐time classification of Twitter trends , 2014, J. Assoc. Inf. Sci. Technol..

[10]  Yannis Manolopoulos,et al.  Continuous Trend-Based Classification of Streaming Time Series , 2005, ADBIS.

[11]  Jiebo Luo,et al.  To Follow or Not to Follow: Analyzing the Growth Patterns of the Trumpists on Twitter , 2016, News@ICWSM.

[12]  Christos Faloutsos,et al.  Retweeting Activity on Twitter: Signs of Deception , 2015, PAKDD.

[13]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[14]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[15]  Haibo He,et al.  A Hierarchical Distributed Fog Computing Architecture for Big Data Analysis in Smart Cities , 2015, ASE BD&SI.

[16]  Martin Szomszor,et al.  #Swineflu: Twitter Predicts Swine Flu Outbreak in 2009 , 2010, eHealth.

[17]  Marco Morana,et al.  A framework for real-time Twitter data analysis , 2016, Comput. Commun..