An Infoveillance System for Detecting and Tracking Relevant Topics From Italian Tweets During the COVID-19 Event

The year 2020 opened with a dramatic epidemic caused by a new species of coronavirus that soon has been declared a pandemic by the WHO due to the high number of deaths and the critical mass of worldwide hospitalized patients, of order of millions. The COVID-19 pandemic has forced the governments of hundreds of countries to apply several heavy restrictions in the citizens’ socio-economic life. Italy was one of the most affected countries with long-term restrictions, impacting the socio-economic tissue. During this lockdown period, people got informed mostly on Online Social Media, where a heated debate followed all main ongoing events. In this scenario, the following study presents an in-depth analysis of the main emergent topics discussed during the lockdown phase within the Italian Twitter community. The analysis has been conducted through a general purpose methodological framework, grounded on a biological metaphor and on a chain of NLP and graph analysis techniques, in charge of detecting and tracking emerging topics in Online Social Media, e.g. streams of Twitter data. A term-frequency analysis in subsequent time slots is pipelined with nutrition and energy metrics for computing hot terms by also exploiting the tweets quality information, such as the social influence of the users. Finally, a co-occurrence analysis is adopted for building a topic graph where emerging topics are suitably selected. We demonstrate via a careful parameter setting the effectiveness of the topic tracking system, tailored to the current Twitter standard API restrictions, in capturing the main sociopolitical events that occurred during this dramatic phase.

[1]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[2]  Melisssa Wall,et al.  "I'll Be Waiting for You Guys": A YouTube Call to Action in the Egyptian Revolution , 2011 .

[3]  Augusto Valeriani,et al.  Follow the leader! Direct and indirect flows of political communication during the 2013 Italian general election campaign , 2015, New Media Soc..

[4]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[5]  Xia Feng,et al.  Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey , 2017, Multimedia Tools and Applications.

[6]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[7]  Jingwen Zhang,et al.  Using Reports of Symptoms and Diagnoses on Social Media to Predict COVID-19 Case Counts in Mainland China: Observational Infoveillance Study , 2020, Journal of Medical Internet Research.

[8]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[9]  J. Zarocostas How to fight an infodemic , 2020, The Lancet.

[10]  Yongdong Zhang,et al.  Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter , 2017, SBP-BRiMS.

[11]  David A. Shamma,et al.  Peaks and persistence: modeling the shape of microblog conversations , 2011, CSCW '11.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Barbara Poblete,et al.  Early Tracking of People's Reaction in Twitter for Fast Reporting of Damages in the Mercalli Scale , 2018, HCI.

[16]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[17]  Ciro Cattuto,et al.  Dynamical classes of collective attention in twitter , 2011, WWW.

[18]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[19]  Lei Wang,et al.  Prediction of the 2017 French election based on Twitter data analysis , 2017, 2017 9th Computer Science and Electronic Engineering (CEEC).

[20]  Padmini Srinivasan,et al.  Bumps and Bruises: Mining Presidential Campaign Announcements on Twitter , 2017, HT.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  Stefano Ceri,et al.  Investigating Italian disinformation spreading on Twitter in the context of 2019 European elections , 2019, PloS one.

[23]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[24]  Martin Franz,et al.  Unsupervised and supervised clustering for topic tracking , 2001, SIGIR '01.

[25]  Marc Cheong,et al.  A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter , 2011, Inf. Syst. Frontiers.

[26]  Alaa Abd-Alrazaq,et al.  Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study , 2020, Journal of medical Internet research.

[27]  Andreu Casero-Ripollés Impact of Covid-19 on the media system. Communicative and democratic consequences of news consumption during the outbreak , 2020 .

[28]  Yiannis Kompatsiaris,et al.  A Graph-Based Clustering Scheme for Identifying Related Tags in Folksonomies , 2010, DaWak.

[29]  Chien Chin Chen,et al.  Life Cycle Modeling of News Events Using Aging Theory , 2003, ECML.

[30]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[31]  Yiannis Kompatsiaris,et al.  Exploring Twitter communication dynamics with evolving community analysis , 2017, PeerJ Comput. Sci..

[32]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[33]  Guixian Xu,et al.  Research on Topic Detection and Tracking for Online News Texts , 2019, IEEE Access.

[34]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[35]  C. Indolfi,et al.  The Outbreak of COVID-19 in Italy , 2020, JACC. Case reports.

[36]  K Denecke,et al.  How to Exploit Twitter for Public Health Monitoring? , 2013, Methods of Information in Medicine.

[37]  H. Raghav Rao,et al.  Information control and terrorism: Tracking the Mumbai terrorist attack through twitter , 2011, Inf. Syst. Frontiers.

[38]  R. Shaw,et al.  Corona Virus (COVID-19) “Infodemic” and Emerging Issues through a Data Lens: The Case of China , 2020, International journal of environmental research and public health.

[39]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[40]  Paul Perco,et al.  Association of the COVID-19 pandemic with Internet Search Volumes: A Google TrendsTM Analysis , 2020, International Journal of Infectious Diseases.

[41]  Cristina M. Pulido,et al.  COVID-19 infodemic: More retweets for science-based information on coronavirus than for false information , 2020, International Sociology.

[42]  Matteo Cinelli,et al.  The COVID-19 social media infodemic , 2020, Scientific reports.

[43]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[44]  Cody Buntain,et al.  Evaluating Public Response to the Boston Marathon Bombing and Other Acts of Terrorism through Twitter , 2016, ICWSM.

[45]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[46]  J. E. C. Saire,et al.  Infoveillance to Analyze Covid19 Impact on Central America Population , 2020, medRxiv.

[47]  Jian Yang,et al.  Finding and Analyzing Principal Features for Measuring User Influence on Twitter , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[48]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[49]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[50]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[51]  Shishir Kumar,et al.  An Effective Approach to Track Levels of Influenza-A (H1N1) Pandemic in India Using Twitter , 2015 .

[52]  Son Doan,et al.  An analysis of Twitter messages in the 2011 Tohoku Earthquake , 2011, eHealth.

[53]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[54]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[55]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[56]  G. Eysenbach Infodemiology: The epidemiology of (mis)information. , 2002, The American journal of medicine.

[57]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .