An Introduction to Ensemble Methods for Data Analysis

This article provides an introduction to ensemble statistical procedures as a special case of algorithmic methods. The discussion begins with classification and regression trees (CART) as a didactic device to introduce many of the key issues. Following the material on CART is a consideration of cross-validation, bagging, random forests, and boosting. Major points are illustrated with analyses of real data.

[1]  J. Friedman Stochastic gradient boosting , 2002 .

[2]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[3]  T. Shakespeare,et al.  Observational Studies , 2003 .

[4]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[5]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[6]  M. Mojirsheibani Combining Classifiers via Discretization , 1999 .

[7]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[8]  Shie Mannor,et al.  The Consistency of Greedy Algorithms for Classification , 2002, COLT.

[9]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[10]  Burton H. Singer,et al.  Recursive partitioning in the health sciences , 1999 .

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  R. Berk Regression Analysis: A Constructive Critique , 2003 .

[13]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[14]  Torsten Hothorn Bundling Predictors in R , 2003 .

[15]  Majid Mojirsheibani,et al.  A consistent combined classification rule , 1997 .

[16]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[17]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Julian J. Faraway Human Animation Using Nonparametric Regression , 2004 .

[20]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[21]  Yves Grandvalet,et al.  Bagging Equalizes Influence , 2004, Machine Learning.

[22]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[23]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[25]  Ian Witten,et al.  Data Mining , 2000 .

[26]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[27]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.

[28]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[29]  Guangzhe Fan,et al.  Regression Tree Analysis Using TARGET , 2005 .

[30]  Mark R. Segal,et al.  Machine Learning Benchmarks and Random Forest Regression , 2004 .

[31]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[32]  R. Berk,et al.  Developing a Practical Forecasting Screener for Domestic Violence Incidents , 2004, Evaluation review.

[33]  W. Loh,et al.  REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION , 2002 .

[34]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[35]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[36]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[39]  Richard A. Berk,et al.  Ensemble Procedures for Finding High Risk Prison Inmates , 2003 .

[40]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[41]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[42]  Richard A. Berk,et al.  Statistical Difficulties in Determining the Role of Race in Capital Cases: A Re-analysis of Data from the State of Maryland* , 2004 .

[43]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[44]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[45]  Xiaogang Su,et al.  Joint Statistical Meetings- Statistical Computing Section Maximum Likelihood Regression Trees , 2022 .