Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams

BackgroundAdverse drug events (ADEs) constitute one of the leading causes of post-therapeutic death and their identification constitutes an important challenge of modern precision medicine. Unfortunately, the onset and effects of ADEs are often underreported complicating timely intervention. At over 500 million posts per day, Twitter is a commonly used social media platform. The ubiquity of day-to-day personal information exchange on Twitter makes it a promising target for data mining for ADE identification and intervention. Three technical challenges are central to this problem: (1) identification of salient medical keywords in (noisy) tweets, (2) mapping drug-effect relationships, and (3) classification of such relationships as adverse or non-adverse.MethodsWe use a bipartite graph-theoretic representation called a drug-effect graph (DEG) for modeling drug and side effect relationships by representing the drugs and side effects as vertices. We construct individual DEGs on two data sources. The first DEG is constructed from the drug-effect relationships found in FDA package inserts as recorded in the SIDER database. The second DEG is constructed by mining the history of Twitter users. We use dictionary-based information extraction to identify medically-relevant concepts in tweets. Drugs, along with co-occurring symptoms are connected with edges weighted by temporal distance and frequency. Finally, information from the SIDER DEG is integrate with the Twitter DEG and edges are classified as either adverse or non-adverse using supervised machine learning.ResultsWe examine both graph-theoretic and semantic features for the classification task. The proposed approach can identify adverse drug effects with high accuracy with precision exceeding 85 % and F1 exceeding 81 %. When compared with leading methods at the state-of-the-art, which employ un-enriched graph-theoretic analysis alone, our method leads to improvements ranging between 5 and 8 % in terms of the aforementioned measures. Additionally, we employ our method to discover several ADEs which, though present in medical literature and Twitter-streams, are not represented in the SIDER databases.ConclusionsWe present a DEG integration model as a powerful formalism for the analysis of drug-effect relationships that is general enough to accommodate diverse data sources, yet rigorous enough to provide a strong mechanism for ADE identification.

[1]  R. O’Neill,et al.  Use of Screening Algorithms and Computer Systems to Efficiently Signal Higher-Than-Expected Combinations of Drugs and Events in the US FDA’s Spontaneous Reports Database , 2002, Drug safety.

[2]  G Hripcsak,et al.  Biclustering of Adverse Drug Events in the FDA's Spontaneous Reporting System , 2011, Clinical pharmacology and therapeutics.

[3]  Hui Yang,et al.  "Hey #311, Come Clean My Street!": A Spatio-temporal Sentiment Analysis of Twitter Data and 311 Civil Complaints , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[4]  L. Goldstein,et al.  Risk of ischemic stroke with tamoxifen treatment for breast cancer , 2004, Neurology.

[5]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[6]  P. Purcell,et al.  Statistical Techniques for Signal Generation , 2002, Drug safety.

[7]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[8]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[9]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[10]  Russ B. Altman,et al.  A novel signal detection algorithm for identifying hidden drug-drug interactions in adverse event reports , 2012, J. Am. Medical Informatics Assoc..

[11]  Xiaowei Xu,et al.  Mining FDA drug labels using an unsupervised learning technique - topic modeling , 2011, BMC Bioinformatics.

[12]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[13]  D. Klein,et al.  The Flawed Basis for FDA Post-Marketing Safety Decisions: The Example of Anti-Depressants and Children , 2006, Neuropsychopharmacology.

[14]  Hua Xu,et al.  Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records , 2013, J. Am. Medical Informatics Assoc..

[15]  Vijay V. Raghavan,et al.  Detecting adverse drug effects using link classification on twitter data , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[16]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[17]  Vijay V. Raghavan,et al.  Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks , 2012, BMC Genomics.

[18]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.

[19]  Peer Bork,et al.  The SIDER database of drugs and side effects , 2015, Nucleic Acids Res..

[20]  William DuMouchel,et al.  Bayesian Data Mining in Large Frequency Tables, with an Application to the FDA Spontaneous Reporting System , 1999 .

[21]  L. Hazell,et al.  Under-Reporting of Adverse Drug Reactions , 2006, Drug safety.

[22]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[23]  Carol Friedman,et al.  Mining multi-item drug adverse effect associations in spontaneous reporting systems , 2010, BMC Bioinformatics.

[24]  A Bate,et al.  From association to alert—a revised approach to international signal analysis , 1999, Pharmacoepidemiology and drug safety.

[25]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[26]  Nancy Brahm,et al.  Alopecia following initiation of lisdexamfetamine in a pediatric patient. , 2009, Primary care companion to the Journal of clinical psychiatry.

[27]  Taha A. Kass-Hout,et al.  Digital Drug Safety Surveillance: Monitoring Pharmaceutical Products in Twitter , 2014, Drug Safety.

[28]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[29]  Graciela Gonzalez-Hernandez,et al.  Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions , 2014, AMIA.

[30]  Ryen W. White,et al.  Web-scale pharmacovigilance: listening to signals from the crowd , 2013, J. Am. Medical Informatics Assoc..

[31]  Emma Heeley,et al.  Automated Signal Generation in Prescription-Event Monitoring , 2002, Drug safety.

[32]  Rawlins,et al.  Attitudinal survey of voluntary reporting of adverse drug reactions. , 1999, British journal of clinical pharmacology.

[33]  T. J. Moore,et al.  Serious adverse drug events reported to the Food and Drug Administration, 1998-2005. , 2007, Archives of internal medicine.

[34]  A E Fletcher,et al.  Quality of life with three antihypertensive treatments. Cilazapril, atenolol, nifedipine. , 1992, Hypertension.

[35]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[36]  P. Corey,et al.  Incidence of Adverse Drug Reactions in Hospitalized Patients , 2012 .

[37]  E Proksch,et al.  [Antilipemic drug-induced skin manifestations]. , 1995, Der Hautarzt; Zeitschrift fur Dermatologie, Venerologie, und verwandte Gebiete.

[38]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[39]  Andrew Bate,et al.  From association to alert—a revised approach to international signal analysis , 1999, Pharmacoepidemiology and drug safety.

[40]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[41]  A. Bate,et al.  A Bayesian neural network method for adverse drug reaction signal generation , 1998, European Journal of Clinical Pharmacology.

[42]  P. Bork,et al.  Systematic identification of proteins that elicit drug side effects , 2013, Molecular systems biology.