Detecting Emerging Symptoms of COVID-19 using Context-based Twitter Embeddings

In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COVID-19, we also test if the proposed approach generalizes to the problem of detecting Adverse Drug Reaction (ADR). We find that the approach applied to Twitter data can detect symptom mentions substantially before being reported by the Centers for Disease Control (CDC).

[1]  C. del Rio,et al.  COVID-19-New Insights on a Rapidly Changing Epidemic. , 2020, JAMA.

[2]  Sharath Chandra Guntuku,et al.  To Retweet or Not to Retweet: Understanding What Features of Cardiovascular Tweets Influence Their Retransmission , 2018, Journal of health communication.

[3]  Melanie Coggan Exploration and Exploitation in Reinforcement Learning , 2004 .

[4]  N. Shah,et al.  Early Detection of Adverse Drug Reactions in Social Health Networks: A Natural Language Processing Pipeline for Signal Detection (Preprint) , 2018 .

[5]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[6]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[7]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[8]  G. Iacobucci,et al.  Covid-19: diabetes clinicians set up social media account to help alleviate patients’ fears , 2020, BMJ.

[9]  J. Rudolph,et al.  Social media for rapid knowledge dissemination: early experience from the COVID‐19 pandemic , 2020, Anaesthesia.

[10]  Felix Adelsbo Exploration and Exploitation in Reinforcement Learning , 2018 .

[11]  Marc Arbyn,et al.  The prevalence of symptoms in 24,410 adults infected by the novel coronavirus (SARS-CoV-2; COVID-19): A systematic review and meta-analysis of 148 studies from 9 countries , 2020, PloS one.

[12]  Keeping up with the times: Lexical creativity in electronic communication , 2007 .

[13]  Michel Beigbeder,et al.  Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists , 2020, Knowledge and Information Systems.

[14]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[15]  Zunyou Wu,et al.  Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention. , 2020, JAMA.

[16]  Noah A. Smith,et al.  Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties , 2012, NAACL.

[17]  Justin Starren,et al.  Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review , 2017, Drug Safety.

[18]  N. Shah,et al.  Early Detection of Adverse Drug Reactions in Social Health Networks: A Natural Language Processing Pipeline for Signal Detection , 2019, JMIR public health and surveillance.

[19]  Graciela Gonzalez-Hernandez,et al.  Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions , 2014, AMIA.

[20]  Sharath Chandra Guntuku,et al.  Twitter Corpus of the #BlackLivesMatter Movement and Counter Protests: 2013 to 2021 , 2020, ICWSM.

[21]  Emily K. Vraga,et al.  A first look at COVID-19 information and misinformation sharing on Twitter , 2020, ArXiv.

[22]  Sharath Chandra Guntuku,et al.  Public Priorities and Concerns Regarding COVID-19 in an Online Discussion Forum: Longitudinal Topic Modeling , 2020, Journal of General Internal Medicine.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Chuhan Wu,et al.  Automatic construction of target-specific sentiment lexicon , 2019, Expert Syst. Appl..

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Sharath Chandra Guntuku,et al.  Tracking Mental Health and Symptom Mentions on Twitter During COVID-19 , 2020, Journal of General Internal Medicine.