Evaluation of Medical Concept Annotation Systems on Clinical Records

Large volumes of electronic health records, including free-text documents, are extensively generated within various sectors of healthcare. Medical concept annotation systems are designed to enrich these documents with key concepts in the domain using reference terminologies. Although there is a wide range of annotation systems, there is a lack of comparative analysis that enables thorough understanding of the effectiveness of both the concept extraction and concept recognition components of these systems, especially within the clinical domain. This paper analyses and evaluates four annotation systems (i.e., MetaMap, NCBO annotator, Ontoserver, and QuickUMLS) for the task of extracting medical concepts from clinical free-text documents. Empirical findings have shown that each annotator exhibits various levels of strengths in terms of overall precision or recall. The concept recognition component of each system, however, was found to be highly sensitive to the quality of the text spans output by the concept extraction component of the annotation system. The effects of these components on each other are quantified in such way as to provide evidence for an informed choice of an annotation system as well as avenues for future research.

[1]  Anni Coden,et al.  The ConceptMapper Approach to Named Entity Recognition , 2010, LREC.

[2]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[3]  Naoaki Okazaki,et al.  Simple and Efficient Algorithm for Approximate Dictionary Matching , 2010, COLING.

[4]  Anthony N. Nguyen,et al.  Exploiting medical hierarchies for concept-based information retrieval , 2012, ADCS.

[5]  Edward H. Shortliffe,et al.  Viewpoint: The Unified Medical Language System: Toward a Collaborative Approach for Solving Terminologic Problems , 1998, J. Am. Medical Informatics Assoc..

[6]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[7]  Mohammad Reza Keyvanpour,et al.  A two-phase hybrid of semi-supervised and active learning approach for sequence labeling , 2013, Intell. Data Anal..

[8]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[11]  Joel D. Martin,et al.  Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 , 2011, J. Am. Medical Informatics Assoc..

[12]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[13]  Luca Soldaini QuickUMLS: a fast, unsupervised approach for medical concept extraction , 2016 .

[14]  Anthony N. Nguyen,et al.  Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning and the SNOMED CT Ontology , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[15]  K. Bretonnel Cohen,et al.  Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters , 2014, BMC Bioinformatics.

[16]  José Luís Oliveira,et al.  BeCAS: biomedical concept recognition services and visualization , 2013, Bioinform..

[17]  Michael Lawley,et al.  Using Australian Medicines Terminology (AMT) and SNOMED CT-AU to better support clinical research , 2012, HIC.

[18]  N. Shah,et al.  NCBO Annotator: Semantic Annotation of Biomedical Data , 2009 .

[19]  Anthony N. Nguyen,et al.  Active learning: a step towards automating medical concept extraction , 2015, J. Am. Medical Informatics Assoc..

[20]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[21]  Jane Hunter,et al.  Recognizing Scientific Artifacts in Biomedical Literature , 2013, Biomedical informatics insights.

[22]  Bevan Koopman,et al.  Semantic Search as Inference , 2014, SIGIR Forum.

[23]  Suresh Manandhar,et al.  SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[24]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[25]  Anthony N. Nguyen,et al.  Medical Free-Text to Concept Mapping as an Information Retrieval Problem , 2014, ADCS '14.

[26]  Erik M. van Mulligen,et al.  Using an ensemble system to improve concept extraction from clinical records , 2012, J. Biomed. Informatics.

[27]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[28]  Sarvnaz Karimi,et al.  Cadec: A corpus of adverse drug event annotations , 2015, J. Biomed. Informatics.

[29]  Tudor Groza,et al.  Concept selection for phenotypes and disease-related annota-tions using support vector machines , 2014 .

[30]  Nigel Collier,et al.  Using silver and semi-gold standard corpora to compare open named entity recognisers , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[31]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[32]  Jane Hunter,et al.  Mining Skeletal Phenotype Descriptions from Scientific Literature , 2013, PloS one.

[33]  Jane Hunter,et al.  Identifying scientific artefacts in biomedical literature: The Evidence Based Medicine use case , 2014, J. Biomed. Informatics.