Track Iran's national COVID-19 response committee’s major concerns using two-stage unsupervised topic modeling

Background Since the World Health Organization (WHO) declared the COVID-19 as a Public Health Emergency of International Concern (PHEIC) on January 31, 2020, governments have been enfaced with crisis for timely responses. The efficacy of these responses directly depends on the social behaviors of the target society. People react to these actions with respect to the information they received from different channels, such as news and social networks. Thus, analyzing news demonstrates a brief view of the information users received during the outbreak. Methods The raw data used in this study is collected from official news channels of news wires and agencies in Telegram messenger, which exceeds 2,400,000 posts. The posts that are quoted by NCRC’s members are collected, cleaned, and divided into sentences. The topic modeling and tracking are utilized in a two-stage framework, which is customized for this problem to separate miscellaneous sentences from those presenting concerns. The first stage is fed with embedding vectors of sentences where they are grouped by the Mapper algorithm. Sentences belonging to singleton nodes are labeled as miscellaneous sentences. The remained sentences are vectorized, adopting Tf-IDF weighting schema in the second stage and topically modeled by the LDA method. Finally, relevant topics are aligned to the list of policies and actions, named topic themes, that are set up by the NCRC. Results Our results show that major concerns presented in about half of the sentences are (1) PCR lab. test, diagnosis, and screening, (2) Closure of the education system, and (3) awareness actions about washing hands and facial mask usage. Among the eight themes, intra-provincial travel and traffic restrictions, as well as briefing the national and provincial status, are under-presented. The timeline of concerns annotated by the preventive actions illustrates the changes in concerns addressed by NCRC. This timeline shows that although the announcements and public responses are not lagged behind the events, but cannot be considered as timely. Furthermore, the fluctuating series of concerns reveal that the NCRC has not a long-time response map, and members react to the closest announced policy/act. Conclusion The results of our study can be used as a quantitative indicator for evaluating the availability of an on-time public response of Iran’s NCRC during the first three months of the outbreak. Moreover, it can be used in comparative studies to investigate the differences between awareness acts in various countries. Results of our customized-design framework showed that about one-third of the discussions of the NCRC’s members cover miscellaneous topics that must be removed from the data.

[1]  Yong Huang,et al.  Dynamic Forecasting of Zika Epidemics Using Google Trends , 2016, bioRxiv.

[2]  Min Zhang,et al.  Using Social Media to Mine and Analyze Public Opinion Related to COVID-19 in China , 2020, International journal of environmental research and public health.

[3]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[4]  S. Rutherford,et al.  Using Google Trends for Influenza Surveillance in South China , 2013, PloS one.

[5]  Sasikiran Kandula,et al.  Reappraising the utility of Google Flu Trends , 2019, PLoS Comput. Biol..

[6]  Tejas Khot,et al.  Visualizing high-dimensional data , 2016, XRDS.

[7]  I. Jurisica,et al.  Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions , 2014, BMC Bioinformatics.

[8]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[9]  Zunyou Wu,et al.  Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention. , 2020, JAMA.

[10]  Min Song,et al.  Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news , 2016, J. Inf. Sci..

[11]  Smithsonian S. Dillon Learning from SARS: Preparing for the Next Disease Outbreak , 2003 .

[12]  Soon Ae Chun,et al.  Twitter sentiment classification for measuring public health concerns , 2015, Social Network Analysis and Mining.

[13]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[14]  Guangjian Liu,et al.  Understand Research Hotspots Surrounding COVID-19 and Other Coronavirus Infections Using Topic Modeling , 2020, medRxiv.

[15]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[16]  J. Wootton,et al.  Search engines. , 1997, Journal of women's health.

[17]  Jay M Bernhardt,et al.  Identifying the public's concerns and the Centers for Disease Control and Prevention's reactions during a health crisis: An analysis of a Zika live Twitter chat. , 2016, American journal of infection control.

[18]  Sharath Chandra Guntuku,et al.  Public Priorities and Concerns Regarding COVID-19 in an Online Discussion Forum: Longitudinal Topic Modeling , 2020, Journal of General Internal Medicine.

[19]  Benjamin W. Nelson,et al.  Rapid assessment of psychological and epidemiological correlates of COVID-19 concern, financial strain, and health-related behavior change in a large online sample , 2020, PloS one.

[20]  K. Yuen,et al.  Clinical Characteristics of Coronavirus Disease 2019 in China , 2020, The New England journal of medicine.

[21]  Miad Faezipour,et al.  Preliminary Flu Outbreak Prediction Using Twitter Posts Classification and Linear Regression With Historical Centers for Disease Control and Prevention Reports: Prediction Framework Study , 2019, JMIR public health and surveillance.

[22]  Bin Zhou,et al.  Tracking the evolution of public concerns in social media , 2013, ICIMCS '13.

[23]  Christopher E. Moody,et al.  Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec , 2016, ArXiv.

[24]  Li Jia Chen,et al.  Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020 , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[25]  Zvonimir Poljak,et al.  The Assessment of Twitter’s Potential for Outbreak Detection: Avian Influenza Case Study , 2019, Scientific Reports.

[26]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[27]  M. Salathé Digital epidemiology: what is it, and where is it going? , 2018, Life Sciences, Society and Policy.

[28]  Bach Tran,et al.  A longitudinal study on the mental health of general population during the COVID-19 epidemic in China , 2020, Brain, Behavior, and Immunity.

[29]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[30]  R. Ho,et al.  Immediate Psychological Responses and Associated Factors during the Initial Stage of the 2019 Coronavirus Disease (COVID-19) Epidemic among the General Population in China , 2020, International journal of environmental research and public health.

[31]  Qian Liu,et al.  Health Communication Through News Media During the Early Stage of the COVID-19 Outbreak in China: Digital Topic Modeling Approach , 2020, Journal of medical Internet research.

[32]  Caroline O. Buckee,et al.  Digital Epidemiology , 2012, PLoS Comput. Biol..

[33]  K. Steadman,et al.  When fear and misinformation go viral: Pharmacists' role in deterring medication misinformation during the 'infodemic' surrounding COVID-19 , 2020, Research in Social and Administrative Pharmacy.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Ali Mohammad Zareh Bidoki,et al.  Search engines, news wires and digital epidemiology: Presumptions and facts , 2018, Int. J. Medical Informatics.

[36]  Matteo Cinelli,et al.  The COVID-19 social media infodemic , 2020, Scientific reports.

[37]  Hannah R. Meredith,et al.  The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application , 2020, Annals of Internal Medicine.

[38]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[39]  Bay Vo,et al.  F-Mapper: A Fuzzy Mapper clustering algorithm , 2020, Knowl. Based Syst..

[40]  Soon Ae Chun,et al.  Monitoring Public Health Concerns Using Twitter Sentiment Classifications , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[41]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[42]  Elia Gabarron,et al.  Ebola, Twitter, and misinformation: a dangerous combination? , 2014, BMJ : British Medical Journal.

[43]  Qiang Chen,et al.  Unpacking the black box: How to promote citizen engagement through government social media during the COVID-19 crisis , 2020, Computers in Human Behavior.

[44]  Z. Memish MERS , 2016, International Journal of Infectious Diseases.

[45]  Sharareh R Niakan Kalhori,et al.  Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study , 2020, JMIR Public Health and Surveillance.

[46]  Valerio Pascucci,et al.  Visualizing High-Dimensional Data: Advances in the Past Decade , 2017, IEEE Transactions on Visualization and Computer Graphics.

[47]  K. Matthijs,et al.  Forgotten key players in public health: news media as agents of information and persuasion during the COVID-19 pandemic , 2020, Public Health.

[48]  Jabra Zarka,et al.  Coronavirus Goes Viral: Quantifying the COVID-19 Misinformation Epidemic on Twitter , 2020, Cureus.

[49]  Bo Thiesson,et al.  Markov Topic Models , 2009, AISTATS.

[50]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[51]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[52]  M. Abdi Coronavirus disease 2019 (COVID-19) outbreak in Iran: Actions and problems , 2020, Infection Control & Hospital Epidemiology.

[53]  Lutfan Lazuardi,et al.  Correlation between Google Trends on dengue fever and national surveillance report in Indonesia , 2019, Global health action.

[54]  L. G. Vu,et al.  Demand for Health Information on COVID-19 among Vietnamese , 2020, International Journal of Environmental Research and Public Health.

[55]  J. Zarocostas How to fight an infodemic , 2020, The Lancet.

[56]  Bennett Kleinberg,et al.  Women worry about family, men about the economy: Gender differences in emotional responses to COVID-19 , 2020, SocInfo.

[57]  Qiang Sun,et al.  Prediction of Number of Cases of 2019 Novel Coronavirus (COVID-19) Using Social Media Search Index , 2020, International journal of environmental research and public health.

[58]  Mingdong Tang,et al.  WE-LDA: A Word Embeddings Augmented LDA Model for Web Services Clustering , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[59]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[60]  C. del Rio,et al.  Assessment of Deaths From COVID-19 and From Seasonal Influenza. , 2020, JAMA internal medicine.

[61]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[62]  Christian E. Lopez,et al.  Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset , 2020, ArXiv.

[63]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[64]  Jay M Bernhardt,et al.  Detecting themes of public concern: a text mining analysis of the Centers for Disease Control and Prevention's Ebola live Twitter chat. , 2015, American journal of infection control.

[65]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[66]  Stacey L Knobler,et al.  Duty to care: acknowledging complexity and uncertainty. , 2008, Nursing inquiry.

[67]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[68]  Igor Jurisica,et al.  Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions , 2014, BMC Bioinformatics.

[69]  R. Shaw,et al.  Corona Virus (COVID-19) “Infodemic” and Emerging Issues through a Data Lens: The Case of China , 2020, International journal of environmental research and public health.

[70]  Heinz Feldmann,et al.  Ebola. , 2020, The New England journal of medicine.