Filtering Entities to Optimize Identification of Adverse Drug Reaction From Social Media: How Can the Number of Words Between Entities in the Messages Help?

Background With the increasing popularity of Web 2.0 applications, social media has made it possible for individuals to post messages on adverse drug reactions. In such online conversations, patients discuss their symptoms, medical history, and diseases. These disorders may correspond to adverse drug reactions (ADRs) or any other medical condition. Therefore, methods must be developed to distinguish between false positives and true ADR declarations. Objective The aim of this study was to investigate a method for filtering out disorder terms that did not correspond to adverse events by using the distance (as number of words) between the drug term and the disorder or symptom term in the post. We hypothesized that the shorter the distance between the disorder name and the drug, the higher the probability to be an ADR. Methods We analyzed a corpus of 648 messages corresponding to a total of 1654 (drug and disorder) pairs from 5 French forums using Gaussian mixture models and an expectation-maximization (EM) algorithm . Results The distribution of the distances between the drug term and the disorder term enabled the filtering of 50.03% (733/1465) of the disorders that were not ADRs. Our filtering strategy achieved a precision of 95.8% and a recall of 50.0%. Conclusions This study suggests that such distance between terms can be used for identifying false positives, thereby improving ADR detection in social media.

[1]  Nazli Goharian,et al.  ADRTrace: Detecting Expected and Unexpected Adverse Drug Reactions from User Reviews on Social Media Sites , 2013, ECIR.

[2]  L. J. D. Berg,et al.  Consumer Adverse Drug Reaction Reporting , 2003 .

[3]  Philip A. Yates,et al.  Point and standard error estimation for quantiles of mixed flood distributions , 2010 .

[4]  Annie Y. S. Lau,et al.  The influence of social networking sites on health behavior change: a systematic review and meta-analysis , 2015, J. Am. Medical Informatics Assoc..

[5]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[7]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[8]  Ming Yang,et al.  Filtering big data from social media - Building an early warning system for adverse drug reactions , 2015, J. Biomed. Informatics.

[9]  Dragan Kukolj,et al.  Technology matching of the patent documents using clustering algorithms , 2013, 2013 IEEE 14th International Symposium on Computational Intelligence and Informatics (CINTI).

[10]  Graciela Gonzalez-Hernandez,et al.  Utilizing social media data for pharmacovigilance: A review , 2015, J. Biomed. Informatics.

[11]  Avinatan Hassidim,et al.  Quantum money , 2012, CACM.

[12]  Yifan Peng,et al.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task , 2016, Database J. Biol. Databases Curation.

[13]  P Ryan,et al.  Novel Data‐Mining Methodologies for Adverse Drug Event Discovery and Analysis , 2012, Clinical pharmacology and therapeutics.

[14]  Lyle H. Ungar,et al.  Identifying potential adverse effects using the web: A new approach to medical hypothesis generation , 2011, J. Biomed. Informatics.

[15]  S. Golder,et al.  Systematic review on the prevalence, frequency and comparative value of adverse events data in social media. , 2015, British journal of clinical pharmacology.

[16]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[17]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[18]  Carol A Gotway Crawford,et al.  A New Source of Data for Public Health Surveillance: Facebook Likes , 2015, Journal of medical Internet research.

[19]  D. Classen,et al.  Adverse drug events in hospitalized patients. Excess length of stay, extra costs, and attributable mortality. , 1997, JAMA.

[20]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[21]  Yen S. Low,et al.  Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art , 2014, Drug Safety.

[22]  A. Burgun,et al.  Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review , 2015, Journal of medical Internet research.

[23]  Adrian E. Raftery,et al.  Model-based Methods of Classification: Using the mclust Software in Chemometrics , 2007 .

[24]  L Pochard,et al.  Analysis of patients' narratives posted on social media websites on benfluorex's (Mediator®) withdrawal in France , 2014, Journal of clinical pharmacy and therapeutics.

[25]  Christopher C. Yang,et al.  Social media mining for drug safety signal detection , 2012, SHB '12.

[26]  GonzalezGraciela,et al.  Utilizing social media data for pharmacovigilance , 2015 .

[27]  Keyuan Jiang,et al.  Mining Twitter Data for Potential Drug Effects , 2013, ADMA.

[28]  L. de Jong-van den Berg,et al.  Consumer adverse drug reaction reporting: a new step in pharmacovigilance? , 2003, Drug safety.

[29]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[30]  Thomas Joseph,et al.  A pipeline to extract drug-adverse event pairs from multiple data sources , 2014, BMC Medical Informatics and Decision Making.

[31]  E. V. van Puijenbroek,et al.  Motives for reporting adverse drug reactions by patient-reporters in the Netherlands , 2010, European Journal of Clinical Pharmacology.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Danushka Bollegala,et al.  Social media and pharmacovigilance: A review of the opportunities and challenges. , 2015, British journal of clinical pharmacology.

[34]  N. Laird,et al.  Incidence of Adverse Drug Events and Potential Adverse Drug Events: Implications for Prevention , 1995 .

[35]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[36]  R. Tagliaferri,et al.  Discovery of drug mode of action and drug repositioning from transcriptional responses , 2010, Proceedings of the National Academy of Sciences.

[37]  Hsinchun Chen,et al.  A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports , 2015, J. Biomed. Informatics.

[38]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[39]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[40]  Jacques Savoy,et al.  Light stemming approaches for the French, Portuguese, German and Hungarian languages , 2006, SAC.

[41]  Byungjin Choi A Graphical Method to Assess Goodness-of-Fit for Inverse Gaussian Distribution , 2013 .

[42]  Jeffery L. Painter,et al.  Social Media Listening for Routine Post-Marketing Safety Surveillance , 2016, Drug Safety.

[43]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.