ada: An R Package for Stochastic Boosting

Boosting is an iterative algorithm that combines simple classification rules with "mediocre" performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boosting step showing an improvement in performance and speed in generating the ensemble. ada is an R package that implements three popular variants of boosting, together with a version of stochastic gradient boosting. In addition, useful plots for data analytic purposes are provided along with an extension to the multi-class case. The algorithms are illustrated with synthetic and real data sets.

[1]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  J. Friedman Stochastic gradient boosting , 2002 .

[4]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[9]  Allan R. Wilks,et al.  The new S language: a programming environment for data analysis and graphics , 1988 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Mark R. Segal,et al.  Machine Learning Benchmarks and Random Forest Regression , 2004 .

[12]  S. Eguchi,et al.  An introduction to the predictive technique AdaBoost with a comparison to generalized additive models , 2005 .

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Torsten Hothorn,et al.  Model-based boosting in high dimensions , 2006, Bioinform..

[15]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[16]  Panlop Zeephongsekul,et al.  Improving the Predictive Power of AdaBoost: A Case Study in Classifying Borrowers , 2003, IEA/AIE.

[17]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[19]  Richard A. Becker,et al.  The New S Language , 1989 .

[20]  Christophe Croux,et al.  Bagging and Boosting Classification Trees to Predict Churn , 2006 .

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[23]  Setsuro Sugata,et al.  Computer Simulation of Hydrodynamic Models for Chemical/Pharmaco-Kinetics , 2001 .

[24]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[25]  Peter G. Welling Pharmacokinetics : Processes, Mathematics, and Applications , 1997 .

[26]  Kai Huang,et al.  Boosting accuracy of automated classification of fluorescence microscope images for location proteomics , 2004, BMC Bioinformatics.

[27]  Ji Zhu,et al.  Improved Classification of Mass Spectrometry Database Search Results Using Newer Machine Learning Approaches* , 2006, Molecular & Cellular Proteomics.

[28]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[29]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .