Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting

BackgroundThe construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age.MethodsWe avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data.ResultsThe results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child.ConclusionsQuantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.

[1]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[2]  C. Bouchard,et al.  Utility of childhood BMI in the prediction of adulthood disease: comparison of national and international references. , 2005, Obesity research.

[3]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[4]  B. Efron Biased Versus Unbiased Estimation , 1975 .

[5]  Pin T. Ng,et al.  Quantile smoothing splines , 1994 .

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  R. Koenker,et al.  Quantile regression methods for reference growth charts , 2006, Statistics in medicine.

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[10]  Elena Rusticelli,et al.  The Obesity Epidemic: Analysis of Past and Projected Future Trends in Selected OECD Countries , 2009 .

[11]  Gerhard Tutz,et al.  Variable Selection and Model Choice in Geoadditive Regression Models , 2009, Biometrics.

[12]  John J Reilly,et al.  Early life risk factors for obesity in childhood: cohort study , 2005, BMJ : British Medical Journal.

[13]  Obert,et al.  PREDICTING OBESITY IN YOUNG ADULTHOOD FROM CHILDHOOD AND PARENTAL OBESITY , 2000 .

[14]  L. Fahrmeir,et al.  Detection of risk factors for obesity in early childhood with quantile regression methods for longitudinal data , 2008 .

[15]  Adrian Bowman,et al.  Generalized additive models for location, scale and shape - Discussion , 2005 .

[16]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Benjamin Hofner,et al.  GAMLSS for high-dimensional data – a flexible approach based on boosting , 2010 .

[21]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[22]  R. Koenker Quantile regression for longitudinal data , 2004 .

[23]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[24]  Trevor Hastie Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting , 2007 .

[25]  A. Beyerlein,et al.  Risk factors for childhood overweight: shift of the mean body mass index and shift of the upper percentiles: results from a cross-sectional study , 2010, International Journal of Obesity.

[26]  Xuming He,et al.  Conditional growth charts , 2006 .

[27]  Noori Akhtar-Danesh,et al.  Childhood obesity, prevalence and prevention , 2005, Nutrition journal.

[28]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[29]  R. Koenker Quantile Regression: Name Index , 2005 .

[30]  L. Fahrmeir,et al.  Alternative regression models to assess increase in childhood BMI , 2008, BMC medical research methodology.