On Classification and Regression Trees for Multiple Responses and Its Application

In many application fields, multivariate approaches that simultaneously consider the correlation between responses are needed. The tree method can be extended to multivariate responses, such as repeated measure and longitudinal data, by modifying the split function so as to accommodate multiple responses. Recently, researchers have constructed some decision trees for multiple continuous longitudinal response and multiple binary responses using Mahalanobis distance and a generalized entropy index. However, these methods have limitations according to the type of response, that is, those that are only continuous or binary. In this paper, we will modify the tree for univariate response procedure and suggest a new tree-based method that can analyze any type of multiple responses by using GEE (generalized estimating equations) techniques. To compare the performance of trees, simulation studies on selection probability of true split variable will be shown. Finally, applications using epileptic seizure data and WWW data are introduced.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  L. Zhao,et al.  Correlated binary regression using a quadratic exponential model , 1990 .

[3]  Stuart R. Lipsitz,et al.  Review of Software to Fit Generalized Estimating Equation Regression Models , 1999 .

[4]  J. Praagman Book reviewClassification and regression trees: Leo BREIMAN, Jerome H. FRIEDMAN, Richard A. OLSHEN and Charles J. STONE The Wadsworth Statistics/Probability Series, Wadsworth, Belmont, 1984, x + 358 pages , 1985 .

[5]  W. Loh,et al.  Generalized regression trees , 1995 .

[6]  P. Speckman,et al.  Multivariate Regression Trees for Analysis of Abundance Data , 2004, Biometrics.

[7]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[8]  Burton H. Singer,et al.  Recursive partitioning in the health sciences , 1999 .

[9]  P. Thall,et al.  Some covariance models for longitudinal count data with overdispersion. , 1990, Biometrics.

[10]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[11]  M. Segal Tree-Structured Methods for Longitudinal Data , 1992 .

[12]  H. Levene Robust tests for equality of variances , 1961 .

[13]  P. Diggle Analysis of Longitudinal Data , 1995 .

[14]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[15]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[16]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[17]  Diane Lambert,et al.  Fitting Trees to Functional Data, with an Application to Time-of-Day Patterns , 1999 .

[18]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[19]  Heping Zhang Classification Trees for Multiple Binary Responses , 1998 .

[20]  G. V. Kass,et al.  AUTOMATIC INTERACTION DETECTION , 1982 .

[21]  D. Cox The Analysis of Multivariate Binary Data , 1972 .