A First Instagram Dataset on COVID-19

The novel coronavirus (COVID-19) pandemic outbreak is drastically shaping and reshaping many aspects of our life, with a huge impact on our social life. In this era of lockdown policies in most of the major cities around the world, we see a huge increase in people and professional engagement in social media. Social media is playing an important role in news propagation as well as keeping people in contact. At the same time, this source is both a blessing and a curse as the coronavirus infodemic has become a major concern, and is already a topic that needs special attention and further research. In this paper, we provide a multilingual coronavirus (COVID-19) Instagram dataset that we have been continuously collected since March 30, 2020. We are making our dataset available to the research community at Github. We believe that this contribution will help the community to better understand the dynamics behind this phenomenon in Instagram, as one of the major social media. This dataset could also help study the propagation of misinformation related to this outbreak.

[1]  Gianluca Stringhini,et al.  Screenshot Classifier annotated images pHashes of non-screenshot annotated images Know Your Meme Generic Annotation Sites Meme Annotation Sites Generic Web Communities , 2018 .

[2]  Emily K. Vraga,et al.  A first look at COVID-19 information and misinformation sharing on Twitter , 2020, ArXiv.

[3]  J. Crowcroft,et al.  Leveraging Data Science to Combat COVID-19: A Comprehensive Review , 2020, IEEE Transactions on Artificial Intelligence.

[4]  Jeremy Blackburn,et al.  "Go eat a bat, Chang!": An Early Look on the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19 , 2020, ArXiv.

[5]  Ahmad Alhindi,et al.  Large Arabic Twitter Dataset on COVID-19 , 2020, ArXiv.

[6]  Reza Farahbakhsh,et al.  Deep Dive on Politician Impersonating Accounts in Social Media , 2019, 2019 IEEE Symposium on Computers and Communications (ISCC).

[7]  Josimar E. Chire Saire,et al.  What is the people posting about symptoms related to Coronavirus in Bogota, Colombia? , 2020, ArXiv.

[8]  Gianluca Stringhini,et al.  The web centipede: understanding how web communities influence each other through the lens of mainstream and alternative news sources , 2017, Internet Measurement Conference.

[9]  Matteo Cinelli,et al.  The COVID-19 social media infodemic , 2020, Scientific reports.

[10]  Sungyong Seo,et al.  COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations , 2020 .

[11]  Christian E. Lopez,et al.  Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset , 2020, ArXiv.

[12]  Jon Crowcroft,et al.  A Large-scale Behavioural Analysis of Bots and Humans on Twitter , 2019, ACM Trans. Web.

[13]  Sungyong Seo,et al.  Coronavirus on Social Media: Analyzing Misinformation in Twitter Conversations , 2020, ArXiv.

[14]  Kristina Lerman,et al.  COVID-19: The First Public Coronavirus Twitter Dataset , 2020, ArXiv.

[15]  Johan Frid,et al.  English dictionaries, gold and silver standard corpora for biomedical natural language processing related to SARS-CoV-2 and COVID-19 , 2020 .