Pharmacovigilance from social media: An improved random subspace method for identifying adverse drug events

OBJECTIVE Recent advances in Web 2.0 technologies have seen significant strides towards utilizing patient-generated content for pharmacovigilance. Social media-based pharmacovigilance has great potential to augment current efforts and provide regulatory authorities with valuable decision aids. Among various pharmacovigilance activities, identifying adverse drug events (ADEs) is very important for patient safety. However, in health-related discussion forums, ADEs may confound with drug indications and beneficial effects, etc. Therefore, the focus of this study is to develop a strategy to identify ADEs from other semantic types, and meanwhile to determine the drug that an ADE is associated with. MATERIALS AND METHODS In this study, two groups of features, i.e., shallow linguistic features and semantic features, are explored. Moreover, motivated and inspired by the characteristics of explored two feature categories for social media-based ADE identification, an improved random subspace method, called Stratified Sampling-based Random Subspace (SSRS), is proposed. Unlike conventional random subspace method that applies random sampling for subspace selection, SSRS adopts stratified sampling-based subspace selection strategy. RESULTS A case study on heart disease discussion forums is performed to evaluate the effectiveness of the SSRS method. Experimental results reveal that the proposed SSRS method significantly outperforms other compared ensemble methods and existing approaches for ADE identification. DISCUSSION AND CONCLUSION Our proposed method is easy to implement since it is based on two feature sets that can be naturally derived, and therefore, can omit artificial stratum generation efforts. Moreover, SSRS has great potential of being applied to deal with other high-dimensional problems that can represent original data from two different aspects.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  Paloma Martínez,et al.  Exploring Spanish health social media for detecting drug effects , 2015, BMC Medical Informatics and Decision Making.

[3]  Eibe Frank,et al.  Naive Bayes for Text Classification with Unbalanced Classes , 2006, PKDD.

[4]  Dmitry Zelenko,et al.  Kernel methods for relation extraction , 2003 .

[5]  Christopher C. Yang,et al.  Postmarketing Drug Safety Surveillance Using Publicly Available Health-Consumer-Contributed Content in Social Media , 2014, TMIS.

[6]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7]  Cécile Paris,et al.  Text and Data Mining Techniques in Adverse Drug Reaction Detection , 2015, ACM Comput. Surv..

[8]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[9]  Razvan C. Bunescu,et al.  Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline , 2006, BioNLP@NAACL-HLT.

[10]  Lyle H. Ungar,et al.  Identifying potential adverse effects using the web: A new approach to medical hypothesis generation , 2011, J. Biomed. Informatics.

[11]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.

[12]  Hsinchun Chen,et al.  Identifying adverse drug events from patient social media: A case study for diabetes , 2015, IEEE Intelligent Systems.

[13]  Joshua Zhexue Huang,et al.  Stratified feature sampling method for ensemble clustering of high dimensional data , 2015, Pattern Recognit..

[14]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[15]  Zhu Zhang,et al.  POS-RS: A Random Subspace method for sentiment classification based on part-of-speech analysis , 2015, Inf. Process. Manag..

[16]  Yanqing Ji,et al.  A Potential Causal Association Mining Algorithm for Screening Adverse Drug Reactions in Postmarketing Surveillance , 2011, IEEE Transactions on Information Technology in Biomedicine.

[17]  Sophia Ananiadou,et al.  Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts , 2016, J. Biomed. Informatics.

[18]  A Bate,et al.  Decision support methods for the detection of adverse events in post-marketing data. , 2009, Drug discovery today.

[19]  Taha A. Kass-Hout,et al.  Digital Drug Safety Surveillance: Monitoring Pharmaceutical Products in Twitter , 2014, Drug Safety.

[20]  Jing Liu,et al.  An ensemble method for extracting adverse drug events from social media , 2016, Artif. Intell. Medicine.

[21]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[22]  Hashim Sharif,et al.  Detecting Adverse Drug Reactions Using a Sentiment Classification Framework , 2014 .

[23]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[24]  Hsinchun Chen,et al.  Identifying Adverse Drug Events from Health Social Media: A Case Study on Heart Disease Discussion Forums , 2014, ICSH.

[25]  Harksoo Kim,et al.  Social relation extraction from texts using a support-vector-machine-based dependency trigram kernel , 2013, Inf. Process. Manag..

[26]  Terry Windeatt,et al.  Decision Tree Simplification For Classifier Ensembles , 2004, Int. J. Pattern Recognit. Artif. Intell..

[27]  Hsinchun Chen,et al.  AZDrugMiner: An Information Extraction System for Mining Patient-Reported Adverse Drug Events in Online Patient Forums , 2013, ICSH.

[28]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[29]  Yunming Ye,et al.  Stratified sampling for feature subspace selection in random forests for high dimensional data , 2013, Pattern Recognit..

[30]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[31]  Haibin Liu,et al.  Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach , 2015, AMIA.

[32]  Keyuan Jiang,et al.  Mining Twitter Data for Potential Drug Effects , 2013, ADMA.

[33]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Hsinchun Chen,et al.  A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports , 2015, J. Biomed. Informatics.

[35]  Guodong Zhou,et al.  Kernel-based semantic relation detection and classification via enriched parse tree structure , 2011 .

[36]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[37]  Zongkai Yang,et al.  Semi-random subspace method for writeprint identification , 2013, Neurocomputing.

[38]  Paloma Martínez,et al.  Pharmacovigilance through the development of text mining and natural language processing techniques , 2015, J. Biomed. Informatics.

[39]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[40]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[41]  Ming Yang,et al.  Filtering big data from social media - Building an early warning system for adverse drug reactions , 2015, J. Biomed. Informatics.

[42]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[43]  Azadeh Nikfarjam,et al.  Pattern mining for extraction of mentions of Adverse Drug Reactions from user comments. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[44]  Syed Abdul Shabbir,et al.  Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts , 2016, Inf..

[45]  Graciela Gonzalez-Hernandez,et al.  Utilizing social media data for pharmacovigilance: A review , 2015, J. Biomed. Informatics.

[46]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[47]  Sarvnaz Karimi,et al.  Cadec: A corpus of adverse drug event annotations , 2015, J. Biomed. Informatics.

[48]  Bartosz Krawczyk,et al.  The deterministic subspace method for constructing classifier ensembles , 2017, Pattern Analysis and Applications.

[49]  Yuanzheng Ge,et al.  A Framework for Generating Geospatial Social Computing Environments , 2015, IEEE Intelligent Systems.