Accelerated proximal boosting

Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable to introduce a novel boosting approach, called proximal boosting. Besides being motivated by non-differentiable optimization, the proposed algorithm benefits from Nesterov’s acceleration in the same way as gradient boosting [Biau et al., 2018]. This leads to a variant, called accelerated proximal boosting. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Gérard Biau,et al.  Accelerated gradient boosting , 2018, Machine Learning.

[3]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[5]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[6]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[7]  Gérard Biau,et al.  Optimization by gradient boosting , 2017, Advances in Contemporary Statistics and Econometrics.

[8]  Gunnar Rätsch,et al.  On the Convergence of Leveraging , 2001, NIPS.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[11]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[12]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[15]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[16]  James D. Malley,et al.  COBRA: A combined regression strategy , 2013, J. Multivar. Anal..

[17]  J. Andrew Bagnell,et al.  Generalized Boosting Algorithms for Convex Optimization , 2011, ICML.

[18]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[19]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[20]  L. Breiman Population theory for boosting ensembles , 2003 .

[21]  Thomas Brox,et al.  iPiano: Inertial Proximal Algorithm for Nonconvex Optimization , 2014, SIAM J. Imaging Sci..

[22]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Lorenzo Rosasco,et al.  Iterative Regularization for Learning with Convex Loss Functions , 2015, J. Mach. Learn. Res..

[25]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[26]  V. N. Temlyakov,et al.  Greedy expansions in convex optimization , 2012, 1206.0393.

[27]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[28]  Tong Zhang,et al.  A General Greedy Approximation Algorithm with Applications , 2001, NIPS.

[29]  E Weinan,et al.  Functional Frank-Wolfe Boosting for General Loss Functions , 2015, ArXiv.

[30]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[31]  J. Friedman Stochastic gradient boosting , 2002 .

[32]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[33]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.