Ensemble approach for identifying medical concepts with special attention to lexical scope

Health-care services are implanted by deploying roots of information extraction techniques. This extraction process is laborious and time-consuming due to unavailability of medical experts. Thus, in the present task, we were motivated to develop an automated extraction system for identifying medical and non-medical concepts. These concepts help to extract the key information from medical corpora. Not only medical concepts but also their non-medical counterparts are equally important for diagnosis purposes. Hence, we have employed three different approaches such as unsupervised, supervised, and their combined ensemble version to identify both medical and non-medical terms (words/phrases). The unsupervised module consists of two phases: parts-of-speech (POS) tagging followed by searching in a domain-specific lexicon, namely WordNet of Medical Event (WME 3.0). On the other hand the supervised module is designed by two machine learning classifiers, namely Naive Bayes and Conditional Random Field (CRF) along with various features like category, POS, sentiment, etc. Finally, we have combined the important outcomes of unsupervised and supervised modules and developed two versions of ensemble module (Ensemble-I and Ensemble-II). All the modules identify uni-gram, bi-gram, tri-gram, and more than tri-gram medical concepts and separate non-medical words or phrases in a context. In order to evaluate all modules of concept identification system, we have prepared an experimental dataset. It has been split into three parts, namely training, development, and test. We observed that ensemble module provides better output in contrast with individual modules and Ensemble-I outperforms Ensemble-II in identifying medical concepts consisting of all possible n-grams. The result analysis shows that the F-measures of 0.91 and 0.94 have been obtained for identifying medical concepts and non-medical words/phrases using both of the ensemble modules, respectively. The present research reports the initial steps to build an automated concept identification framework in health-care. This system assists in designing various domain-specific applications like annotation, categorization, recommendation system, etc.

[1]  Wendy W. Chapman,et al.  Anaphoric relations in the clinical narrative: corpus creation , 2011, J. Am. Medical Informatics Assoc..

[2]  Christiane Fellbaum,et al.  Medical WordNet: A New Methodology for the Construction and Validation of Information Resources for Consumer Health , 2004, COLING.

[3]  Bing Liu,et al.  Rule-Based Classification. , 2014 .

[4]  Abeed Sarker,et al.  Deep neural networks ensemble for detecting medication mentions in tweets , 2019, J. Am. Medical Informatics Assoc..

[5]  Dipankar Das,et al.  Relation Extraction of Medical Concepts Using Categorization and Sentiment Analysis , 2018, Cognitive Computation.

[6]  Erik M. van Mulligen,et al.  Using an ensemble system to improve concept extraction from clinical records , 2012, J. Biomed. Informatics.

[7]  Ralf Klinkenberg,et al.  Data Classification: Algorithms and Applications , 2014 .

[8]  Angus Roberts,et al.  Building a semantically annotated corpus of clinical texts , 2009, J. Biomed. Informatics.

[9]  Erik Cambria,et al.  Bridging the Gap between Structured and Unstructured Health-Care Data through Semantics and Sentics , 2011 .

[10]  Yael Garten,et al.  Recent progress in automatically extracting information from the pharmacogenomic literature. , 2010, Pharmacogenomics.

[11]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[12]  Angus Roberts,et al.  The CLEF Corpus: Semantic Annotation of Clinical Text , 2007, AMIA.

[13]  Dipankar Das,et al.  A Hybrid Approach Based Sentiment Extraction from Medical Context , 2016, SAAIP@IJCAI.

[14]  Dipankar Das,et al.  Lexical Resource for Medical Events: A Polarity Based Approach , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[15]  Paolo Gastaldo,et al.  Bayesian network based extreme learning machine for subjectivity detection , 2017, J. Frankl. Inst..

[16]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[17]  Christopher G. Chute,et al.  Developing a corpus of clinical notes manually annotated for part-of-speech , 2006, Int. J. Medical Informatics.

[18]  Eun Sung Lee,et al.  Exploring the Performance of Stacking Classifier to Predict Depression Among the Elderly , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[19]  Dipankar Das,et al.  NTCIR-12 MOBILECLICK: Sense-based Ranking and Summarization of English Queries , 2016, NTCIR.

[20]  Dipankar Das,et al.  MediConceptNet: An Affinity Score Based Medical Concept Network , 2017, FLAIRS Conference.

[21]  Erik Cambria,et al.  Sentic Computing for patient centered applications , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[22]  Ellen Riloff,et al.  A Study of Concept Extraction Across Different Types of Clinical Notes , 2015, AMIA.

[23]  Dipankar Das,et al.  Relationship Extraction based on Category of Medical Concepts from Lexical Contexts , 2017, ICON.

[24]  Dipankar Das,et al.  Employing sentiment-based affinity and gravity scores to identify relations of medical concepts , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[25]  Ellen Riloff,et al.  Stacked Generalization for Medical Concept Extraction from Clinical Notes , 2015, BioNLP@IJCNLP.

[26]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[27]  Dipankar Das,et al.  WME 3.0: An Enhanced and Validated Lexicon of Medical Concepts , 2018, GWC.

[28]  Hanghang Tong,et al.  Big data classification , 2014 .

[29]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[30]  Dipankar Das,et al.  WME: Sense, Polarity and Affinity based Concept Resource for Medical Events , 2016, GWC.

[31]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[32]  Erik Cambria,et al.  Label Embedding for Zero-shot Fine-grained Named Entity Typing , 2016, COLING.