Evaluating Natural Language Processors in the Clinical Domain

Evaluating natural language processing (NLP) systems in the clinical domain is a difficult task which is important for advancement of the field. A number of NLP systems have been reported that extract information from free-text clinical reports, but not many of the systems have been evaluated. Those that were evaluated noted good performance measures but the results were often weakened by ineffective evaluation methods. In this paper we describe a set of criteria aimed at improving the quality of NLP evaluation studies. We present an overview of NLP evaluations in the clinical domain and also discuss the Message Understanding Conferences (MUC) [1-4]. Although these conferences constitute a series of NLP evaluation studies performed outside of the clinical domain, some of the results are relevant within medicine. In addition, we discuss a number of factors which contribute to the complexity that is inherent in the task of evaluating natural language systems.

[1]  Ralph Grishman,et al.  Design of the MUC-6 evaluation , 1995, MUC.

[2]  Naomi Sager,et al.  Research Paper: Natural Language Processing and the Representation of Clinical Data , 1994, J. Am. Medical Informatics Assoc..

[3]  J. Yerushalmy The statistical assessment of the variability in observer perception and description of roentgenographic pulmonary shadows. , 1969, Radiologic clinics of North America.

[4]  Christian Lovis,et al.  Natural Language Processing and Clinical Support to Improve the Quality of Reimbursement Claim Databases , 1996 .

[5]  L A Lenert,et al.  Monitoring free-text data using medical language processing. , 1993, Computers and biomedical research, an international journal.

[6]  D M Rind,et al.  Designing studies of computer-based alerts and reminders. , 1995, M.D. computing : computers in medical practice.

[7]  Charles Safran,et al.  Using an Electronic Medical Record to Perform Clinical Research on Mitral Valve Prolapse and Panic/Anxiety Disorder , 1995 .

[8]  Christian Lovis,et al.  A Semi-Automatic ICD Encoder , 1996 .

[9]  G W Moore,et al.  Automatic SNOMED coding. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[10]  S Shiffman,et al.  A free-text processing system to capture physical findings: Canonical Phrase Identification System (CAPIS). , 1991, Proceedings. Symposium on Computer Applications in Medical Care.

[11]  George Hripcsak,et al.  Two Applications of Statistical Modelling to Natural Language Processing , 1995, AISTATS.

[12]  C A Sneiderman,et al.  Finding the findings: identification of findings in medical literature using restricted natural language processing. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[13]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[14]  J R Scherrer,et al.  Natural Language Processing and Semantical Representation of Medical Texts , 1992, Methods of Information in Medicine.

[15]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[16]  N L Jain,et al.  Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[17]  Craig A. Will,et al.  Comparing Human and Machine Performance for Natural Language Information Extraction: Results from the Tipster Text Evaluation , 1993, TIPSTER.

[18]  C Lovis,et al.  Analysis of medical texts based on a sound medical model. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[19]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[20]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[21]  Beth M. Sundheim The Message Understanding Conferences , 1996, TIPSTER.

[22]  W R Hersh,et al.  Automated application of clinical practice guidelines for asthma management. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[23]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[24]  Peter J. Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[25]  J R Scherrer,et al.  The Application of Natural-language Processing to Healthcare Quality Assessment , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[26]  T C Rindflesch,et al.  Semantic processing in information retrieval. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[27]  D. Spiegelhalter,et al.  Evaluating medical expert systems: what to test and how? , 1990, Medical informatics = Medecine et informatique.

[28]  Norman E. Sondak,et al.  Computers and medicine , 1979 .

[29]  Leslie A. Lenert,et al.  Automated Linkage ofFree-text Descriptions ofPatients withaPractice Guideline , 1994 .

[30]  George Hripcsak,et al.  Natural language processing in an operational clinical information system , 1995, Natural Language Engineering.

[31]  E. DeLong,et al.  Discordance of Databases Designed for Claims Payment versus Clinical Information Systems: Implications for Outcomes Research , 1993, Annals of Internal Medicine.

[32]  D A Evans,et al.  Automatic Indexing of Abstracts via Natural-language Processing Using a Simple Thesaurus , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[33]  P. Haug,et al.  Computerized extraction of coded findings from free-text radiologic reports. Work in progress. , 1990, Radiology.

[34]  N L Jain,et al.  Respiratory Isolation of Tuberculosis Patients Using Clinical Guidelines and an Automated Clinical Decision Support System , 1998, Infection Control & Hospital Epidemiology.

[35]  P J Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[36]  P Zweigenbaum,et al.  A multi-lingual architecture for building a normalised conceptual representation from medical language. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[37]  D A Evans,et al.  Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[38]  Nicoletta Calzolari,et al.  Review of Medical language processing: computer management of narrative data by Naomi Sager, Carol Friedman, and Margaret S. Lyman. Addison-Wesley 1987. , 1989 .