Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set

Background At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much of the conversation about these phenomena now occurs online on social media platforms like Twitter. Objective In this paper, we describe a multilingual COVID-19 Twitter data set that we are making available to the research community via our COVID-19-TweetIDs GitHub repository. Methods We started this ongoing data collection on January 28, 2020, leveraging Twitter’s streaming application programming interface (API) and Tweepy to follow certain keywords and accounts that were trending at the time data collection began. We used Twitter’s search API to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020. Results Since the inception of our collection, we have actively maintained and updated our GitHub repository on a weekly basis. We have published over 123 million tweets, with over 60% of the tweets in English. This paper also presents basic statistics that show that Twitter activity responds and reacts to COVID-19-related events. Conclusions It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This data set could also help track COVID-19-related misinformation and unverified rumors or enable the understanding of fear and panic—and undoubtedly more.

[1]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[2]  Louette R. Johnson Lutjens Research , 2006 .

[3]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[4]  Emilio Ferrara,et al.  What types of COVID-19 conspiracies are populated by Twitter bots? , 2020, First Monday.

[5]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[6]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[7]  Hai Liang,et al.  How did Ebola information spread on twitter: broadcasting or viral spreading? , 2019, BMC Public Health.

[8]  Han Woo Park,et al.  Conversations and Medical News Frames on Twitter: Infodemiological Study on COVID-19 in South Korea , 2020, Journal of Medical Internet Research.

[9]  E. Dong,et al.  An interactive web-based dashboard to track COVID-19 in real time , 2020, The Lancet Infectious Diseases.

[10]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[11]  Emilio Ferrara,et al.  #COVID-19 on Twitter: Bots, Conspiracies, and Social Media Activism , 2020, ArXiv.

[12]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[13]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[14]  Ansar Abbas,et al.  Data set on coping strategies in the digital age: The role of psychological well-being and social capital among university students in Java Timor, Surabaya, Indonesia , 2020, Data in brief.

[15]  Alaa Abd-Alrazaq,et al.  Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study , 2020, Journal of medical Internet research.