Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models.

PURPOSE To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. RESULTS It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. CONCLUSIONS The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

[1]  Joos V Lebesque,et al.  Comparing different NTCP models that predict the incidence of radiation pneumonitis. Normal tissue complication probability. , 2003, International journal of radiation oncology, biology, physics.

[2]  A. Atkinson Subset Selection in Regression , 1992 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  George A. F. Seber,et al.  A matrix handbook for statisticians , 2007 .

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  J. Lyman Complication Probability as Assessed from Dose-Volume Histograms , 1985 .

[7]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .

[8]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[9]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[10]  José Belderbos,et al.  Biology contributionComparing different NTCP models that predict the incidence of radiation pneumonitis , 2003 .

[11]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[12]  D. Rietveld,et al.  A predictive model for swallowing dysfunction after curative radiotherapy in head and neck cancer. , 2009, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[13]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[14]  D. Yan,et al.  Predicting grade 3 acute diarrhea during radiation therapy for rectal cancer using a cutoff-dose logistic regression normal tissue complication probability model. , 2010, International journal of radiation oncology, biology, physics.

[15]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  N. Aaronson,et al.  Impact of late treatment-related toxicity on quality of life among patients with head and neck cancer treated with radiotherapy. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[18]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[19]  A. Brandes,et al.  Phase I study of weekly oxaliplatin (OXA) + 5-fluorouracil continuous infusion (FU CI) in patients (pts) with advanced colorectal cancer (CRC). , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  J. Deasy,et al.  Multivariable modeling of radiotherapy outcomes, including dose-volume and clinical factors. , 2006, International journal of radiation oncology, biology, physics.

[21]  Denise F. Polit,et al.  Data analysis & statistics for nursing research , 1996 .

[22]  Ameet Bakhai,et al.  Comparison of Bayesian model averaging and stepwise methods for model selection in logistic regression , 2004, Statistics in medicine.

[23]  H. Akaike A new look at the statistical model identification , 1974 .