Global Surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model

As the COVID-19 pandemic continues to unfold, understanding the global impact of non-pharmacological interventions (NPI) is important for formulating effective intervention strategies, particularly as many countries prepare for future waves. We used a machine learning approach to distill latent topics related to NPI from large-scale international news media. We hypothesize that these topics are informative about the timing and nature of implemented NPI, dependent on the source of the information (e.g., local news versus official government announcements) and the target countries. Given a set of latent topics associated with NPI (e.g., self-quarantine, social distancing, online education, etc), we assume that countries and media sources have different prior distributions over these topics, which are sampled to generate the news articles. To model the source-specific topic priors, we developed a semi-supervised, multi-source, dynamic, embedded topic model. Our model is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article. To learn these models, we developed an efficient end-to-end amortized variational inference algorithm. We applied our models to news data collected and labelled by the World Health Organization (WHO) and the Global Public Health Intelligence Network (GPHIN). Through comprehensive experiments, we observed superior topic quality and intervention prediction accuracy, compared to the baseline embedded topic models, which ignore information on media source and intervention labels. The inferred latent topics reveal distinct policies and media framing in different countries and media sources, and also characterize reaction to COVID-19 and NPI in a semantically meaningful manner. Our PyTorch code is available on Github (htps://github.com/li-lab-mcgill/covid19_media).

[1]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[2]  P. Munroe,et al.  Artificial intelligence and machine learning to fight COVID-19 , 2020, Physiological genomics.

[3]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[5]  Jun Zhu,et al.  Scaling up Dynamic Topic Models , 2016, WWW.

[6]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[7]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[8]  Matthias Jarke,et al.  An Interactive System for Visual Analytics of Dynamic Topic Models , 2013, Datenbank-Spektrum.

[9]  Nigel Collier,et al.  Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks , 2014, PloS one.

[10]  David M. Blei,et al.  The Dynamic Embedded Topic Model , 2019, ArXiv.

[11]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[12]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[13]  A Mawudeku,et al.  Big Data and the Global Public Health Intelligence Network (GPHIN). , 2015, Canada communicable disease report = Releve des maladies transmissibles au Canada.

[14]  Dror Walter,et al.  News Frame Analysis: An Inductive Mixed-method Computational Approach , 2019, Communication Methods and Measures.

[15]  David M. Blei,et al.  Topic Modeling in Embedding Spaces , 2019, Transactions of the Association for Computational Linguistics.

[16]  Yotam Ophir,et al.  Coverage of Epidemics in American Newspapers Through the Lens of the Crisis and Emergency Risk Communication Framework. , 2018, Health security.

[17]  Yannick Dufresne,et al.  (Un)Covering the COVID-19 Pandemic: Framing Analysis of the Crisis in Canada , 2020, Canadian Journal of Political Science.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.