论文信息 - Accelerated proximal boosting

Accelerated proximal boosting

Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable to introduce a novel boosting approach, called proximal boosting. Besides being motivated by non-differentiable optimization, the proposed algorithm benefits from Nesterov’s acceleration in the same way as gradient boosting [Biau et al., 2018]. This leads to a variant, called accelerated proximal boosting. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.

[1] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[2] Gérard Biau,et al. Accelerated gradient boosting , 2018, Machine Learning.

[3] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .

[5] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[6] Gunnar Rätsch,et al. An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[7] Gérard Biau,et al. Optimization by gradient boosting , 2017, Advances in Contemporary Statistics and Econometrics.

[8] Gunnar Rätsch,et al. On the Convergence of Leveraging , 2001, NIPS.

[9] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[11] Peter Buhlmann,et al. BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[12] Patrick L. Combettes,et al. Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[13] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[14] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1995, COLT '90.

[15] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[16] James D. Malley,et al. COBRA: A combined regression strategy , 2013, J. Multivar. Anal..

[17] J. Andrew Bagnell,et al. Generalized Boosting Algorithms for Convex Optimization , 2011, ICML.

[18] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.

[19] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[20] L. Breiman. Population theory for boosting ensembles , 2003 .

[21] Thomas Brox,et al. iPiano: Inertial Proximal Algorithm for Nonconvex Optimization , 2014, SIAM J. Imaging Sci..

[22] Tong Zhang,et al. Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[23] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24] Lorenzo Rosasco,et al. Iterative Regularization for Learning with Convex Loss Functions , 2015, J. Mach. Learn. Res..

[25] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .

[26] V. N. Temlyakov,et al. Greedy expansions in convex optimization , 2012, 1206.0393.

[27] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[28] Tong Zhang,et al. A General Greedy Approximation Algorithm with Applications , 2001, NIPS.

[29] E Weinan,et al. Functional Frank-Wolfe Boosting for General Loss Functions , 2015, ArXiv.

[30] L. Breiman. Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[31] J. Friedman. Stochastic gradient boosting , 2002 .

[32] Leo Breiman,et al. Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[33] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.