An unbiased method for constructing multilabel classification trees

A new method for constructing multilabel classification trees is provided. A test statistic for the equality of distributions of multilabel target variable is used as a splitting criterion. The proposed method separates the splitting-variable selection step and the splitting-point selection step. The proposed method is compared with some existing methods in terms of bias and power in variable selection. A simulated data set and two real data sets are used to compare the accuracies of constructed trees. According to the comparative study, the proposed method outperforms existing methods in some properties.

[1]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[2]  S. Zeger,et al.  Marginal Regression Models for Clustered Ordinal Measurements , 1996 .

[3]  Roberta Siciliano,et al.  Multivariate data analysis and modeling through classification and regression trees , 2000 .

[4]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[5]  Nan M. Laird,et al.  Regression Models for a Bivariate Discrete and Continuous Outcome with Clustering , 1995 .

[6]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[7]  Dan Nettleton,et al.  Testing the equality of distributions of random vectors with categorical components , 2001 .

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  Wei-Yin Loh,et al.  Split Selection Methods for Classication Trees Published in Statistica Sinica, 1997, Vol. 7, pp. 815{840 , 1997 .

[12]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[13]  N. Laird,et al.  A likelihood-based method for analysing longitudinal binary responses , 1993 .

[14]  Heping Zhang Classification Trees for Multiple Binary Responses , 1998 .

[15]  P. Diggle,et al.  Modelling multivariate binary data with alternating logistic regressions , 1993 .

[16]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[17]  Yoon-Mo Lee,et al.  A Study on Unbiased Methods in Constructing Classification Trees , 2002 .

[18]  D. Cox The Analysis of Multivariate Binary Data , 1972 .

[19]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .