Tree-Based Methods

In this chapter, tree-based methods are discussed as another of the three major machine learning paradigms considered in the book. This includes the basic information theoretical approach used to construct classification and regression trees and a few simple examples to illustrate the characteristics of decision tree models. Following this is a short introduction to ensemble theory and ensembles of decision trees, leading to random forest models, which are discussed in detail. Unsupervised learning of random forests in particular is reviewed, as these characteristics are potentially important in unsupervised fault diagnostic systems. The interpretation of random forest models includes a discussion on the assessment of the importance of variables in the model, as well as partial dependence analysis to examine the relationship between predictor variables and the response variable. A brief review of boosted trees follows that of random forests, including discussion of concepts, such as gradient boosting and the AdaBoost algorithm. The use of tree-based ensemble models is illustrated by an example on rotogravure printing and the identification of defects in hot rolled steel plate.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[4]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[5]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[6]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[7]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[8]  R. C. Messenger,et al.  A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis , 1972 .

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[11]  Maynard W. Shelly,et al.  Predictive Modeling of Multivariable and Multivariate Data , 1974 .

[12]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[13]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[19]  J. Friedman Stochastic gradient boosting , 2002 .

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[22]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[23]  William A. Belson,et al.  Matching and Prediction on the Principle of Biological Classification , 1959 .

[24]  Chris Aldrich,et al.  Interpretation of nonlinear relationships between process variables by use of random forests , 2012 .

[25]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[26]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[27]  James D. Malley,et al.  Predictor correlation impacts machine learning algorithms: implications for genomic studies , 2009, Bioinform..

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Douglas H. Fisher,et al.  Overcoming process delays with decision tree induction , 1994, IEEE Expert.

[30]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[31]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[32]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[33]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[34]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[35]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[36]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.