BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING

We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.

[1]  G. TEMPLE,et al.  Relaxation Methods in Theoretical Physics , 1946, Nature.

[2]  W. G. Bickley,et al.  Relaxation Methods in Theoretical Physics , 1947 .

[3]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[4]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[5]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[6]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[7]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[8]  William Nick Street,et al.  An Inductive Learning Approach to Prognostic Prediction , 1995, ICML.

[9]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[15]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[16]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[17]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[18]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  G. Ridgeway The State of Boosting ∗ , 1999 .

[21]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[22]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[23]  Yoram Singer,et al.  Boosting for document routing , 2000, CIKM '00.

[24]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[25]  Vladimir N. Temlyakov,et al.  Weak greedy algorithms[*]This research was supported by National Science Foundation Grant DMS 9970326 and by ONR Grant N00014‐96‐1‐1003. , 2000, Adv. Comput. Math..

[26]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[27]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[28]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[29]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[30]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[32]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[33]  Greg Ridgeway Looking for lumps: boosting and bagging for density estimation , 2002 .

[34]  A. Benner,et al.  Application of "Aggregated Classifiers" in Survival Time Studies , 2002, COMPSTAT.

[35]  G. Grudic,et al.  Loss Functions for Binary Class Probability Estimation , 2003 .

[36]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[37]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[38]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[39]  P. Bühlmann,et al.  Volatility estimation with functional gradient descent for very high-dimensional financial time series , 2003 .

[40]  Peter L. Bartlett,et al.  Prediction Algorithms: Complexity, Concentration and Convexity , 2003 .

[41]  Shie Mannor,et al.  Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[42]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[43]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[44]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[45]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[46]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[47]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[50]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[51]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[52]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[53]  P. Zhao Boosted Lasso , 2004 .

[54]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[55]  C. Taylor,et al.  Multistep kernel regression smoothing by boosting , 2004 .

[56]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[57]  David Mease Cost-Weighted Boosting with Jittering and Over / Under-Sampling : JOUS-Boost , 2004 .

[58]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[59]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[60]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[61]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[62]  Francesco Audrino,et al.  Functional gradient descent for financial time series with an application to the measurement of market risk , 2005 .

[63]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[64]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[65]  Francesco Audrino,et al.  A multivariate FGD technique to improve VaR computation in equity markets , 2005, Comput. Manag. Sci..

[66]  Torsten Hothorn,et al.  Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. , 2005, Obesity research.

[67]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[68]  Sanjeev R. Kulkarni,et al.  Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations , 2005, NIPS.

[69]  Gerhard Tutz,et al.  Flexible semiparametric mixed models , 2005 .

[70]  B. Peter,et al.  BOOSTING FOR HIGH-MULTIVARIATE RESPONSES IN HIGH-DIMENSIONAL LINEAR REGRESSION , 2006 .

[71]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[72]  Peter Bühlmann,et al.  Boosting Algorithms: with an Application to Bootstrapping Multivariate Time Series , 2006 .

[73]  Aad van der Vaart,et al.  The Cross-Validated Adaptive Epsilon-Net Estimator , 2006 .

[74]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[75]  Torsten Hothorn,et al.  Model-based boosting in high dimensions , 2006, Bioinform..

[76]  P. Bühlmann,et al.  Survival ensembles. , 2006, Biostatistics.

[77]  G. Tutz,et al.  Generalized Additive Modeling with Implicit Variable Selection by Likelihood‐Based Boosting , 2006, Biometrics.

[78]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[79]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[80]  P. Bühlmann,et al.  Sparse Boosting , 2006, J. Mach. Learn. Res..

[81]  P. Bühlmann Boosting for high-dimensional linear models , 2006 .

[82]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[83]  Peter L. Bartlett,et al.  AdaBoost is Consistent , 2006, J. Mach. Learn. Res..

[84]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[85]  G. Tutz,et al.  Smoothing with curvature constraints based on boosting techniques , 2006 .

[86]  G. Tutz,et al.  A boosting approach to flexible semiparametric mixed models , 2007, Statistics in medicine.

[87]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[88]  Gerhard Tutz,et al.  Knot selection by boosting techniques , 2007, Comput. Stat. Data Anal..

[89]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[90]  G. Tutz,et al.  Generalized Smooth Monotonic Regression in Additive Modeling , 2007 .

[91]  Nicolai Bissantz,et al.  Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications , 2007, SIAM J. Numer. Anal..

[92]  G. Tutz,et al.  Generalized monotonic regression based on B-splines with an application to air pollution data. , 2007, Biostatistics.

[93]  Sonderforschungsbereich Aggregating classifiers with ordinal response structure , 2007 .

[94]  Gerhard Tutz,et al.  Boosting ridge regression , 2007, Comput. Stat. Data Anal..

[95]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[96]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[97]  Charles C. Taylor,et al.  On boosting kernel regression , 2008 .

[98]  K. Hornik,et al.  party : A Laboratory for Recursive Partytioning , 2009 .