Multi-layered Learning for Information Extraction from Adverse Drug Event Narratives

Recognizing named entities in Adverse Drug Reactions narratives is a crucial step towards extracting valuable patient information from unstructured text and transforming the information into an easily processable structured format. This motivates using advanced data analytics to support data-driven pharmacovigilance. Yet existing biomedical named entity recognition (NER) tools are limited in their ability to identify certain entity types from these domain-specific narratives, resulting in poor accuracy. To address this shortcoming, we propose our novel methodology called Tiered Ensemble Learning System with Diversity (TELS-D), an ensemble approach that integrates a rich variety of named entity recognizers to procure the final result. There are two specific challenges faced by biomedical NER: the classes are imbalanced and the lack of a single best performing method. The first challenge is addressed through a balanced, under-sampled bagging strategy that depends on the imbalance level to overcome this highly skewed data problem. To address the second challenge, we design an ensemble of heterogeneous entity recognizers that leverages a novel ensemble combiner. Our experimental results demonstrate that for biomedical text datasets: (i) a balanced learning environment combined with an ensemble of heterogeneous classifiers consistently improves the performance over individual base learners and (ii) stacking-based ensemble combiner methods outperform simple majority voting based solutions by 0.3 in F1-score.

[1]  Özlem Uzuner,et al.  Machine learning and rule-based approaches to assertion classification. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[2]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[3]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[4]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[5]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[6]  Dina Demner-Fushman,et al.  Biomedical Text Mining: A Survey of Recent Progress , 2012, Mining Text Data.

[7]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[8]  Hong Yu,et al.  Bidirectional RNN for Medical Event Detection in Electronic Health Records , 2016, NAACL.

[9]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[10]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[11]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[12]  Fei Xia,et al.  Community annotation experiment for ground truth generation for the i2b2 medication challenge , 2010, J. Am. Medical Informatics Assoc..

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[16]  Fei Xia,et al.  A cascade of classifiers for extracting medication information from discharge summaries , 2011, J. Biomed. Semant..

[17]  Omid Ghiasvand,et al.  Disease Name Extraction from Clinical Text Using Conditional Random Fields , 2014 .

[18]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[19]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[20]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[21]  Lehana Thabane,et al.  Application of data mining techniques in pharmacovigilance. , 2003, British journal of clinical pharmacology.

[22]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[23]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[24]  N. S. Bhutada,et al.  Assessing Pancreatic Cancer Risk Associated with Dipeptidyl Peptidase 4 Inhibitors: Data Mining of FDA Adverse Event Reporting System (FAERS) , 2013 .

[25]  P. Corey,et al.  Incidence of Adverse Drug Reactions in Hospitalized Patients , 2012 .

[26]  Jerzy Stefanowski,et al.  Extending Bagging for Imbalanced Data , 2013, CORES.

[27]  Hoang Nguyen,et al.  Text Mining in Clinical Domain: Dealing with Noise , 2016, KDD.

[28]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[29]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[30]  Son Doan,et al.  Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine , 2010, COLING.

[31]  Peer Bork,et al.  The SIDER database of drugs and side effects , 2015, Nucleic Acids Res..

[32]  Hong Yu,et al.  Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration’s Adverse Event Reporting System Narratives , 2014, JMIR medical informatics.

[33]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[34]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[35]  Elke A. Rundensteiner,et al.  One Size Does Not Fit All: An Ensemble Approach Towards Information Extraction from Adverse Drug Event Narratives , 2018, HEALTHINF.

[36]  Yen S. Low,et al.  Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art , 2014, Drug Safety.