Forecasting the Future: Leveraging RNN based Feature Concatenation for Tweet Outbreak Prediction

Cascade outbreak is a common phenomenon observed across different social networking platforms. Cascade outbreak might have severe implications in different scenarios like a fake news/rumour can spread across a significant number of people, or a hate news can be propagated, which may incite violence etc. Early prediction of cascade outbreak would help in taking proper remedial action and hence is an important research direction. Most of the existing approaches predicted the popularity of social networking post either by machine learning techniques or using statistical models. Simple machine learning based approaches may miss important features while statistical models use hard-coded functions which might not be suitable in a different scenario. With the availability of huge data, recently deep learning based models have also been applied in the prediction of cascade outbreak. This study identified the limitation of existing deep learning based approaches and proposed a Recurrent Neural Network based Hybrid Model with Feature Concatenation (RNN-HMFC) approach. RNN-HMFC captures important latent features of textual aspect and retweet information respectively by LSTM and GRU and also uses a set of handcrafted features like additional tweet information and user social information for prediction of virality. We achieve 2.7% - 6.45% higher accuracy compared to the state of the art methods on different datasets.

[1]  Alberto Montresor,et al.  CAS2VEC: Network-Agnostic Cascade Prediction in Online Social Networks , 2018, 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[2]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[3]  Imran Awan Cyber-Extremism: Isis and the Power of Social Media , 2017, Society.

[4]  Anneli Botha Assessing the vulnerability of Kenyan youths to radicalisation and extremism , 2013 .

[5]  Felix Naumann,et al.  Analyzing and predicting viral tweets , 2013, WWW.

[6]  Kristina Lerman,et al.  Analysis of social voting patterns on digg , 2008, WOSN '08.

[7]  Christina Schori Liang Cyber-Jihad: Understanding and Countering Islamic State Propaganda , 2015 .

[8]  Xiaolong Jin,et al.  Modeling and Predicting Popularity Dynamics of Microblogs using Self-Excited Hawkes Processes , 2015, WWW.

[9]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[10]  Ashish Sureka,et al.  Using KNN and SVM Based One-Class Classifier for Detecting Online Radicalization on Twitter , 2015, ICDCIT.

[11]  Brian D. Davison,et al.  Predicting popular messages in Twitter , 2011, WWW.

[12]  Albert-László Barabási,et al.  Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes , 2014, AAAI.

[13]  Swapnil Mishra,et al.  Feature Driven and Point Process Approaches for Popularity Prediction , 2016, CIKM.

[14]  Yue Liu,et al.  Learning sequential features for cascade outbreak prediction , 2018, Knowledge and Information Systems.

[15]  Masayu Leylia Khodra,et al.  Predicting information cascade on Twitter using support vector regression , 2014, 2014 International Conference on Data and Software Engineering (ICODSE).

[16]  Jan-Michael Frahm,et al.  Retweet Wars: Tweet Popularity Prediction via Dynamic Multimodal Regression , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  M. Osborne,et al.  Using Prediction Markets and Twitter to Predict a Swine Flu Pandemic , 2009 .

[19]  Shah Mahmood Online social networks: The overt and covert communication channels for terrorists and beyond , 2012, 2012 IEEE Conference on Technologies for Homeland Security (HST).

[20]  Gleb Gusev,et al.  Prediction of retweet cascade size over time , 2012, CIKM.

[21]  Joydeep Chandra,et al.  Where should one get news updates: Twitter or Reddit , 2019, Online Soc. Networks Media.

[22]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[23]  Shuai Gao,et al.  Modeling and Predicting Retweeting Dynamics on Microblogging Platforms , 2015, WSDM.

[24]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.