SnapBoost: A Heterogeneous Boosting Machine

Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton descent in a functional space. At each boosting iteration, their goal is to find the base hypothesis, selected from some base hypothesis class, that is closest to the Newton descent direction in a Euclidean sense. Typically, the base hypothesis class is fixed to be all binary decision trees up to a given depth. In this work, we study a Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations. Specifically, at each boosting iteration, the base hypothesis class is chosen, from a fixed set of subclasses, by sampling from a probability distribution. We derive a global linear convergence rate for the HNBM under certain assumptions, and show that it agrees with existing rates for Newton’s method when the Newton direction can be perfectly fitted by the base hypothesis at each boosting iteration. We then describe a particular realization of a HNBM, SnapBoost, that, at each boosting iteration, randomly selects between either a decision tree of variable depth or a linear regressor with random Fourier features. We describe how SnapBoost is implemented, with a focus on the training complexity. Finally, we present experimental results, using OpenML and Kaggle datasets, that show that SnapBoost is able to achieve better generalization loss than competing boosting frameworks, without taking significantly longer to tune.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[3]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[4]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[5]  J. Friedman Stochastic gradient boosting , 2002 .

[6]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[7]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[8]  Bingsheng He,et al.  Privacy-Preserving Gradient Boosting Decision Trees , 2019, AAAI.

[9]  Olivier Teytaud,et al.  Exact Distributed Training: Random Forest with Billions of Examples , 2018, ArXiv.

[10]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[11]  Fabio Sigrist,et al.  KTBoost: Combined Kernel and Tree Boosting , 2019, Neural Processing Letters.

[12]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[13]  Tie-Yan Liu,et al.  DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks , 2019, KDD.

[14]  Mehryar Mohri,et al.  Regularized Gradient Boosting , 2019, NeurIPS.

[15]  Sergei Popov,et al.  Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data , 2019, ICLR.

[16]  Haihao Lu,et al.  Randomized Gradient Boosting Machine , 2018, SIAM J. Optim..

[17]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[18]  Bulat Ibragimov,et al.  Minimal Variance Sampling in Stochastic Gradient Boosting , 2019, NeurIPS.

[19]  Mehryar Mohri,et al.  Deep Boosting , 2014, ICML.

[20]  Francesca Mangili,et al.  Should We Really Use Post-Hoc Tests Based on Mean-Ranks? , 2015, J. Mach. Learn. Res..

[21]  Ishu Trivedi,et al.  Credit Card Fraud Detection , 2016 .

[22]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[23]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[24]  Sylvain Lamprier,et al.  Fair Adversarial Gradient Tree Boosting , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[25]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007, Biomedical engineering online.

[26]  Benjamin Fish,et al.  Fair Boosting : a Case Study , 2015 .

[27]  Martin Jaggi,et al.  Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients , 2018, ArXiv.

[28]  Davide Anguita,et al.  Machine learning approaches for improving condition-based maintenance of naval propulsion plants , 2016 .

[29]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[30]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[31]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[32]  Jianjun Li A two-step rejection procedure for testing multiple hypotheses , 2008 .

[33]  Sercan O. Arik,et al.  TabNet: Attentive Interpretable Tabular Learning , 2019, AAAI.

[34]  Fabio Sigrist,et al.  Gradient and Newton Boosting for Classification and Regression , 2018, Expert Syst. Appl..

[35]  Cho-Jui Hsieh,et al.  GPU-acceleration for Large-scale Tree Boosting , 2017, ArXiv.

[36]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[37]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[38]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.