Comparison of MetaMap and cTAKES for entity extraction in clinical notes

BackgroundClinical notes such as discharge summaries have a semi- or unstructured format. These documents contain information about diseases, treatments, drugs, etc. Extracting meaningful information from them becomes challenging due to their narrative format. In this context, we aimed to compare the automatic extraction capacity of medical entities using two tools: MetaMap and cTAKES.MethodsWe worked with i2b2 (Informatics for Integrating Biology to the Bedside) Obesity Challenge data. Two experiments were constructed. In the first one, only one UMLS concept related with the diseases annotated was extracted. In the second, some UMLS concepts were aggregated.ResultsResults were evaluated with manually annotated medical entities. With the aggregation process the result shows a better improvement. MetaMap had an average of 0.88 in recall, 0.89 in precision, and 0.88 in F-score. With cTAKES, the average of recall, precision and F-score were 0.91, 0.89, and 0.89, respectively.ConclusionsThe aggregation of concepts (with similar and different semantic types) was shown to be a good strategy for improving the extraction of medical entities, and automatic aggregation could be considered in future works.

[1]  David Martínez,et al.  Evaluating the state of the art in disorder recognition and normalization of the clinical narrative , 2014, J. Am. Medical Informatics Assoc..

[2]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[3]  Catarina Silva,et al.  Using text mining to diagnose and classify epilepsy in children , 2013, 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013).

[4]  Reza Hassanpour,et al.  Prediction of Similarities Among Rheumatic Diseases , 2010, Journal of Medical Systems.

[5]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[6]  R. Altman,et al.  Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[7]  Zhiyong Lu,et al.  BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale , 2020, PLoS Comput. Biol..

[8]  Sophia Ananiadou,et al.  Using text mining techniques to extract phenotypic information from the PhenoCHF corpus , 2015, BMC Medical Informatics and Decision Making.

[9]  Cosmin Adrian Bejan,et al.  Pneumonia identification using statistical feature selection , 2012, J. Am. Medical Informatics Assoc..

[10]  Juan C. Quiroz,et al.  Challenges of developing a digital scribe to reduce clinical documentation burden , 2019, npj Digital Medicine.

[11]  Pinar Yildirim,et al.  Clustering Analysis for Vasculitic Diseases , 2010, NDT.

[12]  Francesco Pinciroli,et al.  Attempting to Use MetaMap in Clinical Practice: A Feasibility Study on the Identification of Medical Concepts from Italian Clinical Notes , 2016, MIE.

[13]  Matthias Becker,et al.  Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language , 2016, eHealth.

[14]  Hua Xu,et al.  Recognizing and Encoding Discorder Concepts in Clinical Text using Machine Learning and Vector Space Model , 2013, CLEF.

[15]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[16]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[17]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[18]  Goran Nenadic,et al.  Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives , 2013, J. Am. Medical Informatics Assoc..

[19]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[20]  Seungwoo Hwang Comparison and evaluation of pathway-level aggregation methods of gene expression data , 2012, BMC Genomics.

[21]  Hong-Jie Dai,et al.  Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion , 2016, Database J. Biol. Databases Curation.