An evaluation of machine-learning methods for predicting pneumonia mortality

This paper describes the application of eight statistical and machine-learning methods to derive computer models for predicting mortality of hospital patients with pneumonia from their findings at initial presentation. The eight models were each constructed based on 9847 patient cases and they were each evaluated on 4352 additional cases. The primary evaluation metric was the error in predicted survival as a function of the fraction of patients predicted to survive. This metric is useful in assessing a model's potential to assist a clinician in deciding whether to treat a given patient in the hospital or at home. We examined the error rates of the models when predicting that a given fraction of patients will survive. We examined survival fractions between 0.1 and 0.6. Over this range, each model's predictive error rate was within 1% of the error rate of every other model. When predicting that approximately 30% of the patients will survive, all the models have an error rate of less than 1.5%. The models are distinguished more by the number of variables and parameters that they contain than by their error rates; these differences suggest which models may be the most amenable to future implementation as paper-based guidelines.

[1]  G F Cooper,et al.  The use of misclassification costs to learn rule-based decision support models for cost-effective hospital admission strategies. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[2]  H. Warner,et al.  A mathematical approach to medical diagnosis. Application to congenital heart disease. , 1961, JAMA.

[3]  Norberto F. Ezquerra,et al.  Neural computing in medicine , 1994, Artif. Intell. Medicine.

[4]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[5]  Von-Wun Soo,et al.  Learning and discovery from a clinical database: an incremental concept formation approach , 1994, Artif. Intell. Medicine.

[6]  V. Clark,et al.  Computer-aided multivariate analysis , 1991 .

[7]  Foster J. Provost,et al.  RL4: a tool for knowledge-based induction , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[8]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[9]  M. Clinton Miller,et al.  Medical Diagnostic Models. A Bibliography. , 1978 .

[10]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[11]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[12]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[13]  Geoffrey I. Webb,et al.  Inducing diagnostic rules for glomerular disease with the DLG machine learning algorithm , 1992, Artif. Intell. Medicine.

[14]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[15]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[17]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[18]  James A. Reggia,et al.  Neural computation in medicine , 1993, Artif. Intell. Medicine.

[19]  Andrew K. C. Wong,et al.  Automating the knowledge acquisition process in the construction of medical expert systems , 1990, Artif. Intell. Medicine.

[20]  Michael I. Jordan A Statistical Approach to Decision Tree Modeling , 1994, ICML.

[21]  Mirsad Hadzikadic,et al.  Automated design of diagnostic systems , 1992, Artif. Intell. Medicine.

[22]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[23]  B. Dawson-Saunders,et al.  Basic and Clinical Biostatistics , 1993 .

[24]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[25]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .