Decision tree methods: applications for classification and prediction

Summary Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

[1]  Kate Allan,et al.  Opportunities for prevention and intervention with young children: lessons from the Canadian incidence study of reported child abuse and neglect , 2013, Child and Adolescent Psychiatry and Mental Health.

[2]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[3]  Hua Jin,et al.  A Procedure for Determining Whether a Simple Combination of Diagnostic Tests May Be Noninferior to the Theoretical Optimum Combination , 2008, Medical decision making : an international journal of the Society for Medical Decision Making.

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  Michael J. A. Berry,et al.  Mastering Data Mining: The Art and Science of Customer Relationship Management , 1999 .

[6]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[7]  Minhaz Fahim Zibran,et al.  CHI-Squared Test of Independence , 2007 .

[8]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[9]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[10]  D. Felsenberg,et al.  Tree-structured subgroup analysis of receiver operating characteristic curves for diagnostic tests. , 2012, Academic radiology.

[11]  M R Segal,et al.  A comparison of estimated proportional hazards models and regression trees. , 1989, Statistics in medicine.

[12]  Heping Zhang,et al.  Splitting Criteria in Survival Trees , 1995 .

[13]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[14]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[15]  M R Segal,et al.  Features of Tree‐Structured Survival Analysis , 1997, Epidemiology.

[16]  S. Ramachandram,et al.  Decision Tree Induction: An Approach for Data Classification Using AVL-Tree , 2010 .

[17]  I. Schnauder,et al.  Discussions , 2009 .

[18]  Ying Lu,et al.  Alternative Tree-Structured Survival Analysis Based on Variance of Survival Time , 2004, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .

[20]  Nan Lin,et al.  Tree-Based Methods and Their Applications , 2006 .

[21]  S. Keleş,et al.  Residual‐based tree‐structured survival analysis , 2002, Statistics in medicine.

[22]  H. Christensen,et al.  Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach , 2009, BMC psychiatry.

[23]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[24]  A. Nadkarni,et al.  Shanghai Archives of Psychiatry , 2014 .

[25]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[26]  H. Genant,et al.  Classification Algorithms for Hip Fracture Prediction Based on Recursive Partitioning Methods , 2004, Medical decision making : an international journal of the Society for Medical Decision Making.

[27]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[28]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[29]  Nikita Patel,et al.  Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA , 2012 .