论文信息 - Practice of Epidemiology Mortality Risk Score Prediction in an Elderly Population Using Machine Learning

Practice of Epidemiology Mortality Risk Score Prediction in an Elderly Population Using Machine Learning

Standard practice for prediction often relies on parametric regression methods. Interesting new methods from the machine learning literature have been introduced in epidemiologic studies, such as random forest and neural networks. However, a priori, an investigator will not know which algorithm to select and may wish to try several. Here I apply the super learner, an ensembling machine learning approach that combines multiple algorithms into a single algorithm and returns a prediction function with the best cross-validated mean squared error. Super learning is a generalization of stacking methods. I used super learning in the Study of Physical Performance and Age-Related Changes in Sonomans (SPPARCS) to predict death among 2,066 residents of Sonoma, California, aged 54 years or more during the period 1993 – 1999. The super learner for predicting death (risk score) improved upon all single algorithms in the collection of algorithms, although its performance was similar to that of several algorithms. Super learner outperformed the worst algorithm (neural networks) by 44% with respect to estimated cross-validated mean squared error and had an R 2 value of 0.201. The improvement of super learner over random forest with respect to R 2 was approximately 2-fold. Alternatives for risk score prediction include the super learner, which can provide improved performance.

S. Rose

[1] P. Austin,et al. The iScore Predicts Poor Functional Outcomes Early After Hospitalization for an Acute Ischemic Stroke , 2011, Stroke.

[2] Johanna M Seddon,et al. Risk models for progression to advanced age-related macular degeneration using demographic, environmental, genetic, and ocular factors. , 2011, Ophthalmology.

[3] M. J. Laan,et al. Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[4] Sherri Rose,et al. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. , 2011, American journal of epidemiology.

[5] Gustavo Saposnik,et al. IScore: A Risk Score to Predict Death Early After Hospitalization for an Acute Ischemic Stroke , 2011, Circulation.

[6] Susan C. Miller,et al. The advanced dementia prognostic tool: a risk score to estimate survival in nursing home residents with advanced dementia. , 2010, Journal of pain and symptom management.

[7] Peter C Austin,et al. Logistic regression had superior performance compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure. , 2010, Journal of clinical epidemiology.

[8] E. Seto,et al. Using variable importance measures from causal inference to rank risk factors of schistosomiasis infection in a rural setting in China , 2010, Epidemiologic perspectives & innovations : EP+I.

[9] S. Peng,et al. Random forest can predict 30‐day mortality of spontaneous intracerebral hemorrhage with remarkable discrimination , 2010, European journal of neurology.

[10] M. Thun,et al. Performance of Common Genetic Variants in Breast-cancer Risk Models , 2022 .

[11] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.