Classification algorithms applied to narrative reports

Narrative text reports represent a significant source of clinical data. However, the information stored in these reports is inaccessible to many automated decision support systems. Data mining techniques can assist in extracting information from narrative data. Multiple classification methods, such as rule generation, decision trees, Bayesian classifiers, and information retrieval were used to classify a set of 200 chest X-ray reports according to 6 clinical conditions indicated. A general-purpose natural language processor was used to convert the narrative text into a coded form that could be used by the classification algorithms. Significant differences in performance were found between algorithms. The best performing algorithm applied to the processor output was significantly better than information retrieval applied to raw text. Predictor variables from the coded processor output were limited to avoid overfitting. Methods that limited by domain knowledge performed significantly better than those that limited by conditional probabilities of the variables in the training set. Algorithms were also shown to be dependent on training set size.

[1]  Lucila Ohno-Machado,et al.  Improving machine learning performance by removing redundant cases in medical data sets , 1998, AMIA.

[2]  L A Lenert,et al.  Automated linkage of free-text descriptions of patients with a practice guideline. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[3]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[4]  P. Sprent,et al.  Applied nonparametric statistical methods , 1988 .

[5]  P J Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[6]  Peter J. Haug,et al.  Diagnosing community-acquired pneumonia with a Bayesian network , 1998, AMIA.

[7]  George Hripcsak,et al.  Research Paper: Access to Data: Comparing AccessMed With Query by Review , 1996, J. Am. Medical Informatics Assoc..

[8]  P Zweigenbaum,et al.  MENELAS: an access system for medical records using natural language. , 1994, Computer methods and programs in biomedicine.

[9]  Peter J. Haug,et al.  Automatic extraction of PIOPED interpretations from ventilation/perfusion lung scan reports , 1998, AMIA.

[10]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[11]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[12]  P. Srinivasan Retrieval feedback in MEDLINE. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[13]  R B D'Agostino,et al.  A comparison of logistic regression to decision-tree induction in a medical domain. , 1993, Computers and biomedical research, an international journal.

[14]  Don McNicol,et al.  A Primer of Signal Detection Theory , 1976 .

[15]  George Hripcsak,et al.  Knowledge discovery and data mining to assist natural language understanding , 1998, AMIA.

[16]  Xiao-Hua Zhou,et al.  Research Paper: Using Computer-based Medical Records to Predict Mortality Risk for Inner-city Patients with Reactive Airways Disease , 1997, J. Am. Medical Informatics Assoc..

[17]  L A Lenert,et al.  Monitoring free-text data using medical language processing. , 1993, Computers and biomedical research, an international journal.

[18]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[19]  L Goodwin,et al.  Data mining issues for improved birth outcomes. , 1997, Biomedical sciences instrumentation.

[20]  J M McDonald,et al.  Pathology information systems: data mining leads to knowledge discovery. , 1998, Archives of pathology & laboratory medicine.

[21]  Peter J. Haug,et al.  Bayesian modeling for linking causally related observations in chest X-ray reports , 1998, AMIA.

[22]  David W. Aha,et al.  Analyses of Instance-Based Learning Algorithms , 1991, AAAI.

[23]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[24]  G Hripcsak,et al.  Desperately seeking data: knowledge base-database links. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[25]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[26]  Peter J. Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[27]  Thomas G. Dietterich,et al.  A study of distance-based machine learning algorithms , 1994 .

[28]  H P Selker,et al.  Systems for Comparing Actual and Predicted Mortality Rates: Characteristics To Promote Cooperation in Improving Hospital Care , 1993, Annals of Internal Medicine.

[29]  Konrad Lang,et al.  Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain - Acute Abdominal Pain Study Group , 1996, Artif. Intell. Medicine.

[30]  P. Sprent,et al.  19. Applied Nonparametric Statistical Methods , 1995 .

[31]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.