SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media

With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems.

[1]  Azadeh Nikfarjam,et al.  Pattern mining for extraction of mentions of Adverse Drug Reactions from user comments. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[2]  Friedhelm Schwenker,et al.  Co-Training by Committee: A Generalized Framework for Semi-Supervised Learning with Committees , 2008, Int. J. Softw. Informatics.

[3]  Taha A. Kass-Hout,et al.  Digital Drug Safety Surveillance: Monitoring Pharmaceutical Products in Twitter , 2014, Drug Safety.

[4]  Jing Liu,et al.  An ensemble method for extracting adverse drug events from social media , 2016, Artif. Intell. Medicine.

[5]  Anne-Lyse Minard,et al.  Feature selection for drug-drug interaction detection using machine-learning based approaches , 2011 .

[6]  Hsinchun Chen,et al.  AZDrugMiner: An Information Extraction System for Mining Patient-Reported Adverse Drug Events in Online Patient Forums , 2013, ICSH.

[7]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[8]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[9]  ChengXiang Zhai,et al.  SideEffectPTM: an unsupervised topic model to mine adverse drug reactions from health forums , 2014, BCB.

[10]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[11]  Keyuan Jiang,et al.  Mining Twitter Data for Potential Drug Effects , 2013, ADMA.

[12]  Nanda Kambhatla,et al.  Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations , 2004, ACL 2004.

[13]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[14]  Yihao Zhang,et al.  Semi-supervised learning combining co-training with active learning , 2014, Expert Syst. Appl..

[15]  Stefan M. Rüger,et al.  Adverse Drug Reaction Classification With Deep Neural Networks , 2016, COLING.

[16]  Wolfgang Nejdl,et al.  How valuable is medical social media data? Content analysis of the medical web , 2009, Inf. Sci..

[17]  Hsinchun Chen,et al.  Identifying Adverse Drug Events from Health Social Media: A Case Study on Heart Disease Discussion Forums , 2014, ICSH.

[18]  Graciela Gonzalez-Hernandez,et al.  Utilizing social media data for pharmacovigilance: A review , 2015, J. Biomed. Informatics.

[19]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Siwei Luo,et al.  A random subspace method for co-training , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[22]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[23]  Guodong Zhou,et al.  Extracting relation information from text documents by exploring various types of knowledge , 2007, Inf. Process. Manag..

[24]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[25]  Oladimeji Farri,et al.  Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks , 2017, WWW.

[26]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[27]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[28]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[29]  Lyle H. Ungar,et al.  Identifying potential adverse effects using the web: A new approach to medical hypothesis generation , 2011, J. Biomed. Informatics.

[30]  Carmen C. Y. Poon,et al.  Big Data for Health , 2015, IEEE Journal of Biomedical and Health Informatics.

[31]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[32]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[33]  Haibin Liu,et al.  Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach , 2015, AMIA.

[34]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Ophir Frieder,et al.  Extracting Adverse Drug Reactions from Social Media , 2015, AAAI.

[36]  Zhi-Hua Zhou,et al.  Exploiting unlabeled data to enhance ensemble diversity , 2009, 2010 IEEE International Conference on Data Mining.

[37]  Yanqing Ji,et al.  A Potential Causal Association Mining Algorithm for Screening Adverse Drug Reactions in Postmarketing Surveillance , 2011, IEEE Transactions on Information Technology in Biomedicine.

[38]  Zhen Jiang,et al.  Inter-training: Exploiting unlabeled data in multi-classifier systems , 2013, Knowl. Based Syst..

[39]  Zhi-Hua Zhou When semi-supervised learning meets ensemble learning , 2011 .

[40]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[41]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[42]  Ming Yang,et al.  Filtering big data from social media - Building an early warning system for adverse drug reactions , 2015, J. Biomed. Informatics.

[43]  Richard B. Berlin,et al.  Predicting adverse drug events from personal health messages. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[44]  Jiexun Li,et al.  Kernel-based learning for biomedical relation extraction , 2008 .

[45]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[46]  Paloma Martínez,et al.  Exploring Spanish health social media for detecting drug effects , 2015, BMC Medical Informatics and Decision Making.

[47]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[48]  Zeng Xian-hua A Random Subspace Method for Co-Training , 2008 .

[49]  Jihoon Yang,et al.  Data and text mining Kernel approaches for genic interaction extraction , 2008 .

[50]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[51]  Bo Luo,et al.  Mining Adverse Drug Reactions from online healthcare forums using Hidden Markov Model , 2014, BMC Medical Informatics and Decision Making.

[52]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.

[53]  W. Inman,et al.  Under-reporting of adverse drug reactions. , 1985, British medical journal.

[54]  Shiliang Sun,et al.  Multiple-View Multiple-Learner Semi-Supervised Learning , 2011, Neural Processing Letters.

[55]  Hsinchun Chen,et al.  A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports , 2015, J. Biomed. Informatics.

[56]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[57]  Christopher C. Yang,et al.  Postmarketing Drug Safety Surveillance Using Publicly Available Health-Consumer-Contributed Content in Social Media , 2014, TMIS.

[58]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[59]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[60]  Christopher C. Yang,et al.  Social media mining for drug safety signal detection , 2012, SHB '12.

[61]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[62]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[63]  César de Pablo-Sánchez,et al.  Using a shallow linguistic kernel for drug-drug interaction extraction , 2011, J. Biomed. Informatics.

[64]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[65]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[66]  Sophia Ananiadou,et al.  Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts , 2016, J. Biomed. Informatics.

[67]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[68]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[69]  Shuang Wang,et al.  Differentially private genome data dissemination through top-down specialization , 2014, BMC Medical Informatics and Decision Making.

[70]  Hsinchun Chen,et al.  Social Media Analytics and Intelligence , 2010, IEEE Intell. Syst..