Automated encoding of clinical documents based on natural language processing.

OBJECTIVE The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. METHODS An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. RESULTS Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. CONCLUSION Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

[1]  George Hripcsak,et al.  Research Paper: Knowledge-based Approaches to the Maintenance of a Large Controlled Medical Terminology , 1994, J. Am. Medical Informatics Assoc..

[2]  Daniel C. Berrios Automated indexing for full text information retrieval , 2000, AMIA.

[3]  Arie Hasman,et al.  The granularity of medical narratives and its effect on the speed and completeness of information retrieval. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[4]  Christopher G. Chute,et al.  The Role of Compositionality in Standardized Problem List Generation , 1998, MedInfo.

[5]  John E. Mattison,et al.  Review: The HL7 Clinical Document Architecture , 2001, J. Am. Medical Informatics Assoc..

[6]  Kent A. Spackman,et al.  Compositional concept representation using SNOMED: towards further convergence of clinical terminologies , 1998, AMIA.

[7]  Randolph A. Miller,et al.  Research Paper: An Experiment Comparing Lexical and Statistical Methods for Extracting MeSH Terms from Clinical Free Text , 1998, J. Am. Medical Informatics Assoc..

[8]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[9]  Lawrence M. Fagan,et al.  Knowledge requirements for automated inference of medical textbook markup , 1999, AMIA.

[10]  J Starren,et al.  Architectural requirements for a multipurpose natural language processor in the clinical environment. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[11]  Peter L. Elkin,et al.  UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[12]  R. Côté Systematized nomenclature of human and veterinary medicine : SNOMED international , 1993 .

[13]  Carol Friedman,et al.  Automating ICD-9-CM Encoding Using Medical Language Processing: A Feasibility Study , 2000, AMIA.

[14]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[15]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[16]  W. G. Cole,et al.  Metaphrase: An Aid to the Clinical Conceptualization and Formalization of Patient Problems in Healthcare Enterprises , 1998, Methods of Information in Medicine.

[17]  William T. Hole,et al.  Finding UMLS Metathesaurus concepts in MEDLINE , 2002, AMIA.

[18]  Hongfang Liu,et al.  Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[19]  Christoph Wick,et al.  Augmented Reality Simulator for Training in Two-Dimensional Echocardiography , 2000, Comput. Biomed. Res..

[20]  Hongfang Liu,et al.  Disambiguating Ambiguous Biomedical Terms in Biomedical Narrative Text: An Unsupervised Method , 2001, J. Biomed. Informatics.

[21]  Carol Friedman,et al.  Automating SNOMED coding using medical language understanding: a feasibility study , 2001, AMIA.

[22]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[23]  Alan R. Aronson,et al.  Towards linking patients and clinical information: detecting UMLS concepts in e-mail , 2003, J. Biomed. Informatics.

[24]  C A Smith,et al.  Automated Semantic Indexing of Imaging Reports to Support Retrieval of Medical Images in the Multimedia Electronic Medical Record , 1999, Methods of Information in Medicine.

[25]  Padmini Srinivasan,et al.  Exploring text mining from MEDLINE , 2002, AMIA.

[26]  Olivier Bodenreider,et al.  The NLM Indexing Initiative , 2000, AMIA.

[27]  Craig A. Morioka,et al.  IndexFinder: A Method of Extracting Key Concepts from Clinical Texts for Indexing , 2003, AMIA.

[28]  N L Jain,et al.  Respiratory Isolation of Tuberculosis Patients Using Clinical Guidelines and an Automated Clinical Decision Support System , 1998, Infection Control & Hospital Epidemiology.

[29]  W Hersh,et al.  The SAPHIRE server: a new algorithm and implementation. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[30]  Yang Huang,et al.  Research Paper: A Pilot Study of Contextual UMLS Indexing to Improve the Precision of Concept-based Representation in XML-structured Clinical Radiology Reports , 2003, J. Am. Medical Informatics Assoc..

[31]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[32]  Hongfang Liu,et al.  Representing information in patient reports using natural language processing and the extensible markup language. , 1999, Journal of the American Medical Informatics Association : JAMIA.

[33]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[34]  Henry J. Lowe,et al.  Selective Automated Indexing of Findings and Diagnoses in Radiology Reports , 2001, J. Biomed. Informatics.

[35]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[36]  George Hripcsak,et al.  Coding Neuroradiology Reports for the Northern Manhattan Stroke Study: A Comparison of Natural Language Processing and Manual Review , 2000, Comput. Biomed. Res..

[37]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.