Understanding Information Spreading Mechanisms During COVID-19 Pandemic by Analyzing the Impact of Tweet Text and User Features for Retweet Prediction

COVID-19 has affected the world economy and the daily life routine of almost everyone. It has been a hot topic on social media platforms such as Twitter, Facebook, etc. These social media platforms enable users to share information with other users who can reshare this information, thus causing this information to spread. Twitter’s retweet functionality allows users to share the existing content with other users without altering the original content. Analysis of social media platforms can help in detecting emergencies during pandemics that lead to taking preventive measures. One such type of analysis is predicting the number of retweets for a given COVID-19 related tweet. Recently, CIKM organized a retweet prediction challenge for COVID-19 tweets focusing on using numeric features only. However, our hypothesis is, tweet text may play a vital role in an accurate retweet prediction. In this paper, we combine numeric and text features for COVID-19 related Preprint submitted to Elsevier June 15, 2021 ar X iv :2 10 6. 07 34 4v 1 [ cs .S I] 2 6 M ay 2 02 1 retweet predictions. For this purpose, we propose two CNN and RNN based models and evaluate the performance of these models on a publicly available TweetsCOV19 dataset using seven different evaluation metrics. Our evaluation results show that combining tweet text with numeric features improves the performance of retweet prediction significantly.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Y. Shoham,et al.  Mean Absolute Error , 2010, Encyclopedia of Machine Learning and Data Mining.

[3]  Muhammad Imran Malik,et al.  Two Stream Deep Network for Document Image Classification , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[4]  Ralf Herbrich,et al.  Predicting Information Spreading in Twitter , 2010 .

[5]  Marina Kogan,et al.  Think Local, Retweet Global: Retweeting by the Geographically-Vulnerable during Hurricane Sandy , 2015, CSCW.

[6]  Thore Graepel,et al.  Matchbox: large scale online bayesian recommendations , 2009, WWW '09.

[7]  Guandong Xu,et al.  Event Detection in Twitter Stream using Weighted Dynamic Heartbeat Graph Approach , 2019, IEEE Comput. Intell. Mag..

[8]  Jan-Michael Frahm,et al.  Retweet Wars: Tweet Popularity Prediction via Dynamic Multimodal Regression , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Jeffrey Nichols,et al.  Who will retweet this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information , 2014, IUI.

[10]  Xuanjing Huang,et al.  Retweet Behavior Prediction Using Hierarchical Dirichlet Process , 2015, AAAI.

[11]  Sadia Din,et al.  A deep learning-based social distance monitoring framework for COVID-19 , 2020, Sustainable Cities and Society.

[12]  Andreas Dengel,et al.  A Robust Hybrid Approach for Textual Document Classification , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[13]  Jie Tang,et al.  Predicting individual retweet behavior by user similarity: A multi-task learning approach , 2015, Knowl. Based Syst..

[14]  Thorsten Brants,et al.  Natural Language Processing in Information Retrieval , 2003, CLIN.

[15]  Xuanjing Huang,et al.  Hot Topic-Aware Retweet Prediction with Masked Self-attentive Model , 2019, SIGIR.

[16]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[17]  Shawky Mansour,et al.  Sociodemographic determinants of COVID-19 incidence rates in Oman: Geospatial modelling using multiscale geographically weighted regression (MGWR) , 2020, Sustainable Cities and Society.

[18]  Nasir Hayat,et al.  The evaluation of reanalysis and analysis products of solar radiation for Sindh province, Pakistan , 2019 .

[19]  Krys J. Kochut,et al.  Text Summarization Techniques: A Brief Survey , 2017, International Journal of Advanced Computer Science and Applications.

[20]  R. Manmatha,et al.  Predicting retweet count using visual cues , 2013, CIKM.

[21]  Anchal Gupta,et al.  Prediction of Likes and Retweets Using Text Information Retrieval , 2020 .

[22]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[23]  Alex Sherstinsky,et al.  Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network , 2018, Physica D: Nonlinear Phenomena.

[24]  Cees Snoek,et al.  Latent Factors of Visual Popularity Prediction , 2015, ICMR.

[25]  T. Rashidi,et al.  Impact of the COVID-19 pandemic on travel behavior in Istanbul: A panel data analysis , 2020, Sustainable Cities and Society.

[26]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[27]  Alireza Sadeghian,et al.  Topic specific emotion detection for retweet prediction , 2019, Int. J. Mach. Learn. Cybern..

[28]  Andreas Dengel,et al.  Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification , 2020, ArXiv.

[29]  Yoav Goldberg,et al.  Understanding Convolutional Neural Networks for Text Classification , 2018, BlackboxNLP@EMNLP.

[30]  Alireza Sadeghian,et al.  Retweet prediction considering user's difference as an author and retweeter , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[31]  Waqar Mahmood,et al.  A Precisely Xtreme-Multi Channel Hybrid Approach for Roman Urdu Sentiment Analysis , 2020, IEEE Access.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.