The Evolution of Boosting Algorithms

BACKGROUND The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. Nowadays, boosting algorithms are often applied to estimate and select predictor effects in statistical regression models. OBJECTIVES This review article attempts to highlight the evolution of boosting algorithms from machine learning to statistical modelling. METHODS We describe the AdaBoost algorithm for classification as well as the two most prominent statistical boosting approaches, gradient boosting and likelihood-based boosting for statistical modelling. We highlight the methodological background and present the most common software implementations. RESULTS Although gradient boosting and likelihood-based boosting are typically treated separately in the literature, they share the same methodological roots and follow the same fundamental concepts. Compared to the initial machine learning algorithms, which must be seen as black-box prediction schemes, they result in statistical models with a straight-forward interpretation. CONCLUSIONS Statistical boosting algorithms have gained substantial interest during the last decade and offer a variety of options to address important research questions in modern biomedicine.

[1]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[2]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[3]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[4]  Gunnar Rätsch,et al.  Advanced Lectures on Machine Learning , 2004, Lecture Notes in Computer Science.

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  B. Peter BOOSTING FOR HIGH-DIMENSIONAL LINEAR MODELS , 2006 .

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Matthias Schmid,et al.  Boosting the Concordance Index for Survival Data – A Unified Framework To Derive and Evaluate Biomarker Combinations , 2013, PloS one.

[9]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[10]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[11]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[12]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[13]  G. Tutz,et al.  Generalized Additive Modeling with Implicit Variable Selection by Likelihood‐Based Boosting , 2006, Biometrics.

[14]  A. Futschik,et al.  A Fast Estimate for the Population Recombination Rate Based on Regression , 2013, Genetics.

[15]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[16]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[17]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[18]  James D. Malley,et al.  Statistical Learning for Biomedical Data , 2011 .

[19]  Temple F. Smith Occam's razor , 1980, Nature.

[20]  M. Kohler,et al.  Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory , 2014, Biometrical journal. Biometrische Zeitschrift.

[21]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[22]  Sanford Weisberg,et al.  Computing science and statistics : proceedings of the 30th Symposium on the Interface, Minneapolis, Minnesota, May 13-16, 1998 : dimension reduction, computational complexity and information , 1998 .

[23]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[24]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[25]  G. Ridgeway The State of Boosting ∗ , 1999 .

[26]  M Schumacher,et al.  Tailoring sparse multivariable regression techniques for prognostic single‐nucleotide polymorphism signatures , 2013, Statistics in medicine.

[27]  J. D. Malley,et al.  Probability Machines , 2011, Methods of Information in Medicine.

[28]  Harald Binder,et al.  Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models , 2008, BMC Bioinformatics.

[29]  Gerhard Tutz,et al.  Boosting ridge regression , 2007, Comput. Stat. Data Anal..

[30]  Andreas Ziegler,et al.  Risk estimation and risk prediction using machine-learning methods , 2012, Human Genetics.

[31]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[32]  M. Schmid,et al.  The Importance of Knowing When to Stop , 2012, Methods of Information in Medicine.

[33]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[34]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[35]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[36]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[37]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[38]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[39]  J. Skilling Bayesian Methods in Cosmology: Foundations and algorithms , 2009 .

[40]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[41]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[42]  Benjamin Hofner,et al.  Model-based boosting in R: a hands-on tutorial using the R package mboost , 2012, Computational Statistics.

[43]  Trevor Hastie Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting , 2007 .

[44]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[45]  Thomas G. Dietterich Overfitting and undercomputing in machine learning , 1995, CSUR.

[46]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[47]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[48]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[49]  Yoav Freund,et al.  Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:131-156, 2008 , 2008 .

[50]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[53]  K E Liu,et al.  Improvement of Adequate Use of Warfarin for the Elderly Using Decision Tree-based Approaches , 2013, Methods of Information in Medicine.

[54]  Hongzhe Li,et al.  Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data , 2005, Bioinform..

[55]  Torsten Hothorn,et al.  Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. , 2005, Obesity research.

[56]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[57]  G Tutz,et al.  Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting , 2012, Methods of Information in Medicine.

[58]  T Hothorn,et al.  Boosting into a new terminological era. , 2012, Methods of information in medicine.

[59]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[60]  Paul Sajda,et al.  Machine learning for detection and diagnosis of disease. , 2006, Annual review of biomedical engineering.

[61]  R. Schild,et al.  A New Formula for Optimized Weight Estimation in Extreme Fetal Macrosomia (≥ 4500 g) , 2012, Ultraschall in der Medizin.

[62]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[63]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[64]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[65]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[66]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[67]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[68]  AT. WHARTON,et al.  Response to Mease and Wyner , Evidence Contrary to the Statistical View of Boosting , JMLR 9 : 131 – 156 , 2008 , 2022 .

[69]  Li Mao,et al.  Gene Expression Profiling Predicts the Development of Oral Cancer , 2011, Cancer Prevention Research.

[70]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[71]  Torsten Hothorn,et al.  REJOINDER: BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2777.

[72]  H. Binder,et al.  Extending Statistical Boosting , 2014, Methods of Information in Medicine.

[73]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[74]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[75]  Mary E. Edgerton,et al.  Selective Genomic Copy Number Imbalances and Probability of Recurrence in Early-Stage Breast Cancer , 2011, PloS one.