Comparison of Machine Learning Techniques with Classical Statistical Models in Predicting Health Outcomes

Several machine learning techniques (multilayer and single layer perceptron, logistic regression, least square linear separation and support vector machines) are applied to calculate the risk of death from two biomedical data sets, one from patient care records, and another from a population survey. Each dataset contained multiple sources of information: history of related symptoms and other illnesses, physical examination findings, laboratory tests, medications (patient records dataset), health attitudes, and disabilities in activities of daily living (survey dataset). Each technique showed very good mortality prediction in the acute patients data sample (AUC up to 0.89) and fair prediction accuracy for six year mortality (AUC from 0.70 to 0.76) in individuals from epidemiological database surveys. The results suggest that the nature of data is of primary importance rather than the learning technique. However, the consistently superior performance of the artificial neural network (multi-layer perceptron) indicates that nonlinear relationships (which cannot be discerned by linear separation techniques) can provide additional improvement in correctly predicting health outcomes.

[1]  Kenneth Rockwood,et al.  A brief clinical instrument to classify frailty in elderly people , 1999, The Lancet.

[2]  K. Rockwood,et al.  Comprehensive geriatric assessment. Helping your elderly patients maintain functional well-being. , 1998, Postgraduate medicine.

[3]  Robin Eastwood,et al.  Canadian study of health and aging , 1992 .

[4]  A Hartz,et al.  A Measure of Mortality Risk for Elderly Patients with Acute Myocardial Infarction , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[5]  Johan A. K. Suykens,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2004, Machine Learning.

[6]  Kenneth Rockwood,et al.  Some mathematical models of frailty and their clinical implications , 2002 .

[7]  David W Bates,et al.  Major adverse outcomes after percutaneous transluminal coronary angioplasty: a clinical prediction rule. , 2003, Journal of clinical epidemiology.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  D. Robertson,et al.  Response bias in a health status survey of elderly people. , 1989, Age and ageing.

[10]  M W Knuiman,et al.  An empirical comparison of multivariable methods for estimating risk of death from coronary heart disease. , 1997, Journal of cardiovascular risk.

[11]  David D Cravens,et al.  Comprehensive geriatric assessment for non-geriatricians. , 2006, Missouri medicine.

[12]  Christianna S. Williams,et al.  Development and Validation of a Risk‐Adjustment Index for Older Patients: The High‐Risk Diagnoses for the Elderly Scale , 2002, Journal of the American Geriatrics Society.

[13]  I McDowell,et al.  Correlates of Nonparticipation in the Canadian Study of Health and Aging , 2001, International Psychogeriatrics.

[14]  Johan A. K. Suykens,et al.  Support Vector Machines : Least Squares Approaches and Extensions , 2003 .

[15]  Patrice Degoulet,et al.  Models to predict cardiovascular risk: comparison of CART, multilayer perceptron and logistic regression , 2000, AMIA.

[16]  Arnold B Mitnitski,et al.  The estimation of relative fitness and frailty in community-dwelling older adults using self-report data. , 2004, The journals of gerontology. Series A, Biological sciences and medical sciences.

[17]  Mandeep Singh,et al.  Scores for Post–Myocardial Infarction Risk Stratification in the Community , 2002, Circulation.