Gradient boosting machines, a tutorial

Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. Three practical examples of gradient boosting applications are presented and comprehensively analyzed.

[1]  M Congedo,et al.  A review of classification algorithms for EEG-based brain–computer interfaces , 2007, Journal of neural engineering.

[2]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[3]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[4]  Simon J. Pittman,et al.  Multi-Scale Approach for Predicting Fish Species Distributions across Coral Reef Seascapes , 2011, PloS one.

[5]  O. Sporns,et al.  Complex brain networks: graph theoretical analysis of structural and functional systems , 2009, Nature Reviews Neuroscience.

[6]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[7]  Patrick van der Smagt,et al.  EMG-based teleoperation and manipulation with the DLR LWR-III , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Trevor Hastie Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting , 2007 .

[10]  Yifan Hu,et al.  Efficient, High-Quality Force-Directed Graph Drawing , 2006 .

[11]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[12]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Wenxin Jiang On weak base hypotheses and their implications for boosting regression and classification , 2002 .

[15]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Xiaolong Li,et al.  Gradient Boosting Learning of Hidden Markov Models , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[19]  Xin Yao,et al.  A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[20]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[21]  V Latora,et al.  Efficient behavior of small-world networks. , 2001, Physical review letters.

[22]  Elias Oliveira,et al.  Agglomeration and Elimination of Terms for Dimensionality Reduction , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[23]  Quoc V. Le,et al.  Learning to Rank with Non-Smooth Cost Functions , 2007 .

[24]  J Oliver,et al.  Earthquake Prediction , 1987, Journal of the World Association for Emergency and Disaster Medicine.

[25]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[27]  Huanhuan Chen,et al.  Predictive Ensemble Pruning by Expectation Propagation , 2009, IEEE Transactions on Knowledge and Data Engineering.

[28]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[29]  Elias Oliveira,et al.  An Evolving System Based on Probabilistic Neural Network , 2010, 2010 Eleventh Brazilian Symposium on Neural Networks.

[30]  Torsten Hothorn,et al.  Geoadditive regression modeling of stream biological condition , 2010, Environmental and Ecological Statistics.

[31]  Guanrong Chen,et al.  Stability of a neural network model with small-world connections. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Tong Zhang,et al.  Learning Nonlinear Functions Using Regularized Greedy Forest , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  P. K. Sinha,et al.  Pruning of Random Forest classifiers: A survey and future directions , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[34]  Chang Shu,et al.  Artificial neural network ensembles and their application in pooled flood frequency analysis , 2004 .

[35]  M S Lewicki,et al.  A review of methods for spike sorting: the detection and classification of neural action potentials. , 1998, Network.

[36]  D. Simard,et al.  Fastest learning in small-world neural networks , 2004, physics/0402076.

[37]  Yanjun Qi Random Forest for Bioinformatics , 2012 .

[38]  C. Sutton Classification and Regression Trees, Bagging, and Boosting , 2005 .

[39]  Torsten Hothorn,et al.  Flexible boosting of accelerated failure time models , 2008, BMC Bioinformatics.

[40]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[41]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[42]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[43]  A.R. Runnalls,et al.  A Kullback-Leibler Approach to Gaussian Mixture Reduction , 2007 .

[44]  Yu Hu,et al.  Boosted Mixture Learning of Gaussian Mixture Hidden Markov Models Based on Maximum Likelihood for Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[45]  Yuan Li,et al.  Earthquake Prediction by RBF Neural Network Ensemble , 2004, ISNN.

[46]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[47]  R. Shibata BOOTSTRAP ESTIMATE OF KULLBACK-LEIBLER INFORMATION FOR MODEL SELECTION , 1997 .

[48]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[49]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[50]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[51]  Benjamin Hofner,et al.  Model-based boosting in R: a hands-on tutorial using the R package mboost , 2012, Computational Statistics.

[52]  Thomas G. Dietterich,et al.  Incorporating Boosted Regression Trees into Ecological Latent Variable Models , 2011, AAAI.

[53]  Stéphan Clémençon,et al.  Tree-Based Ranking Methods , 2009, IEEE Transactions on Information Theory.

[54]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..