SGLB: Stochastic Gradient Langevin Boosting

In this paper, we introduce Stochastic Gradient Langevin Boosting (SGLB) - a powerful and efficient machine learning framework, which may deal with a wide range of loss functions and has provable generalization guarantees. The method is based on a special form of the Langevin diffusion equation specifically designed for gradient boosting. This allows us to guarantee the global convergence even for multimodal loss functions, while standard gradient boosting algorithms can guarantee only local optimum. SGLB is implemented as a part of the CatBoost gradient boosting library and it outperforms classic gradient boosting when applied to classification tasks with 0-1 loss function, which is known to be multimodal.

[1]  G. Hooker,et al.  Boulevard: Regularized Stochastic Gradient Boosted Trees and Their Limiting Distribution , 2018, 1806.09762.

[2]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[3]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[4]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[5]  Dacheng Tao,et al.  Historical Gradient Boosting Machine , 2018, GCAI.

[6]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[7]  Scott Sanner,et al.  Algorithms for Direct 0-1 Loss Optimization in Binary Classification , 2013, ICML.

[8]  Iosif Ilitch Gikhman,et al.  Introduction to the theory of random processes , 1969 .

[9]  Mert Gürbüzbalaban,et al.  Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[10]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[11]  Saul B. Gelfand,et al.  Theory and application of annealing algorithms for continuous optimization , 1992, WSC '92.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  E. Milman On the role of convexity in isoperimetry, spectral gap and concentration , 2007, 0712.4092.

[14]  J. Friedman Stochastic gradient boosting , 2002 .

[15]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[16]  Yimin Wei,et al.  Perturbation Identities for Regularized Tikhonov Inverses and Weighted Pseudoinverses , 2000 .

[17]  Harold J. Kushner,et al.  On the Weak Convergence of Interpolated Markov Chains to a Diffusion , 1974 .

[18]  Karan Singh,et al.  Efficient Regret Minimization in Non-Convex Games , 2017, ICML.

[19]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[20]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[21]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[22]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[23]  Bulat Ibragimov,et al.  Minimal Variance Sampling in Stochastic Gradient Boosting , 2019, NeurIPS.

[24]  Gleb Gusev,et al.  Learning to select for a predefined ranking , 2019, ICML.

[25]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..