MixBoost: A Heterogeneous Boosting Machine

Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton descent in a functional space. At each boosting iteration, their goal is to find the base hypothesis, selected from some base hypothesis class, that is closest to the Newton descent direction in a Euclidean sense. Typically, the base hypothesis class is fixed to be all binary decision trees up to a given depth. In this work, we study a Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations. Specifically, at each boosting iteration, the base hypothesis class is chosen, from a fixed set of subclasses, by sampling from a probability distribution. We derive a global linear convergence rate for the HNBM under certain assumptions, and show that it agrees with existing rates for Newton's method when the Newton direction can be perfectly fitted by the base hypothesis at each boosting iteration. We then describe a particular realization of a HNBM, MixBoost, that, at each boosting iteration, randomly selects between either a decision tree of variable depth or a linear regressor with random Fourier features. We describe how MixBoost is implemented, with a focus on the training complexity. Finally, we present experimental results, using OpenML and Kaggle datasets, that show that MixBoost is able to achieve better generalization loss than competing boosting frameworks, without taking significantly longer to tune.

[1]  Sergei Popov,et al.  Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data , 2019, ICLR.

[2]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[3]  J. Friedman Stochastic gradient boosting , 2002 .

[4]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[5]  Sylvain Lamprier,et al.  Fair Adversarial Gradient Tree Boosting , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[6]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[7]  Mehryar Mohri,et al.  Regularized Gradient Boosting , 2019, NeurIPS.

[8]  Olivier Teytaud,et al.  Exact Distributed Training: Random Forest with Billions of Examples , 2018, ArXiv.

[9]  Cho-Jui Hsieh,et al.  GPU-acceleration for Large-scale Tree Boosting , 2017, ArXiv.

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007, Biomedical engineering online.

[12]  Davide Anguita,et al.  Machine learning approaches for improving condition-based maintenance of naval propulsion plants , 2016 .

[13]  Bingsheng He,et al.  Privacy-Preserving Gradient Boosting Decision Trees , 2019, AAAI.

[14]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[15]  Fabio Sigrist,et al.  Gradient and Newton Boosting for Classification and Regression , 2018, Expert Syst. Appl..

[16]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[17]  Mehryar Mohri,et al.  Deep Boosting , 2014, ICML.

[18]  Haihao Lu,et al.  Randomized Gradient Boosting Machine , 2018, SIAM J. Optim..

[19]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[20]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[21]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[22]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[23]  Benjamin Fish,et al.  Fair Boosting : a Case Study , 2015 .

[24]  Francesca Mangili,et al.  Should We Really Use Post-Hoc Tests Based on Mean-Ranks? , 2015, J. Mach. Learn. Res..

[25]  Tie-Yan Liu,et al.  DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks , 2019, KDD.

[26]  Jianjun Li A two-step rejection procedure for testing multiple hypotheses , 2008 .

[27]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[28]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[29]  Sercan O. Arik,et al.  TabNet: Attentive Interpretable Tabular Learning , 2019, AAAI.

[30]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[31]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[32]  Martin Jaggi,et al.  Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients , 2018, ArXiv.

[33]  Fabio Sigrist,et al.  KTBoost: Combined Kernel and Tree Boosting , 2019, Neural Processing Letters.

[34]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[35]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[36]  Bulat Ibragimov,et al.  Minimal Variance Sampling in Stochastic Gradient Boosting , 2019, NeurIPS.

[37]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.