KTBoost: Combined Kernel and Tree Boosting

In this article, we introduce a novel boosting algorithm called `KTBoost', which combines kernel boosting and tree boosting. In each boosting iteration, the algorithm adds either a regression tree or reproducing kernel Hilbert space (RKHS) regression function to the ensemble of base learners. Intuitively, the idea is that discontinuous trees and continuous RKHS regression functions complement each other, and that this combination allows for better learning of both continuous and discontinuous functions as well as functions that exhibit parts with varying degrees of regularity. We empirically show that KTBoost outperforms both tree and kernel boosting in terms of predictive accuracy on a wide array of data sets.

[1]  David Mease,et al.  Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[2]  Martin J. Wainwright,et al.  Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..

[3]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[4]  Torsten Hothorn,et al.  Model-Based Boosting , 2015 .

[5]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[6]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Martin J. Wainwright,et al.  Divide and Conquer Kernel Ridge Regression , 2013, COLT.

[9]  Benjamin Hofner,et al.  Generalized additive models for location, scale and shape for high dimensional data—a flexible approach based on boosting , 2012 .

[10]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[11]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[13]  Taiji Suzuki,et al.  On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces , 2019, Neural Networks.

[14]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[15]  Fabio Sigrist,et al.  Grabit: Gradient tree-boosted Tobit models for default prediction , 2017, Journal of Banking & Finance.

[16]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[17]  Guna Seetharaman,et al.  Multiview Boosting With Information Propagation for Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[18]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[19]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[20]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[21]  Mikhail Belkin,et al.  Diving into the shallows: a computational perspective on large-scale shallow learning , 2017, NIPS.

[22]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[23]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[24]  T. Gneiting Compactly Supported Correlation Functions , 2002 .

[25]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[26]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[27]  Taiji Suzuki,et al.  Functional Gradient Boosting based on Residual Network Perception , 2018, ICML.

[28]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[29]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[30]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[31]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[32]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[33]  Mikhail Belkin,et al.  Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.

[34]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[35]  John Langford,et al.  Learning Deep ResNet Blocks Sequentially using Boosting Theory , 2017, ICML.

[36]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[37]  Martin J. Wainwright,et al.  Early Stopping for Kernel Boosting Algorithms: A General Analysis With Localized Complexities , 2017, IEEE Transactions on Information Theory.

[38]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[39]  Gilles Blanchard,et al.  Optimal learning rates for Kernel Conjugate Gradient regression , 2010, NIPS.

[40]  Fabio Sigrist,et al.  Gradient and Newton Boosting for Classification and Regression , 2018, Expert Syst. Appl..

[41]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[42]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[43]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[44]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[45]  Thomas Colthurst,et al.  TF Boosted Trees: A Scalable TensorFlow Based Framework for Gradient Boosting , 2017, ECML/PKDD.

[46]  Martin J. Wainwright,et al.  Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[47]  M. Bevilacqua,et al.  Estimation and prediction using generalized Wendland covariance functions under fixed domain asymptotics , 2016, The Annals of Statistics.

[48]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[49]  B. Yu,et al.  Boosting with the L 2-loss regression and classification , 2001 .