Constructing a decision tree from data with hierarchical class labels

Most decision tree classifiers are designed to classify the data with categorical or Boolean class labels. Unfortunately, many practical classification problems concern data with class labels that are naturally organized as a hierarchical structure, such as test scores. In the hierarchy, the ranges in the upper levels are less specific but easier to predict, while the ranges in the lower levels are more specific but harder to predict. To build a decision tree from this kind of data, we must consider how to classify data so that the class label can be as specific as possible while also ensuring the highest possible accuracy of the prediction. To the best of our knowledge, no previous research has considered the induction of decision trees from data with hierarchical class labels. This paper proposes a novel classification algorithm for learning decision tree classifiers from data with hierarchical class labels. Empirical results show that the proposed method is efficient and effective in both prediction accuracy and prediction specificity.

[1]  Michael J. A. Berry,et al.  Mastering Data Mining: The Art and Science of Customer Relationship Management , 1999 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[6]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[7]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[8]  Nicholas Kushmerick,et al.  Learning to remove Internet advertisements , 1999, AGENTS '99.

[9]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[10]  Carlo Zaniolo,et al.  CMP: a fast decision tree classifier using multivariate predictions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[11]  I. Hatono,et al.  Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Dino Pedreschi,et al.  A classification-based methodology for planning audit strategies in fraud detection , 1999, KDD '99.

[14]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[15]  Michael J. Shaw,et al.  Knowledge management and data mining for marketing , 2001, Decis. Support Syst..

[16]  James T. C. Teng,et al.  A Dynamic Programming Based Pruning Method for Decision Trees , 2001, INFORMS J. Comput..

[17]  Peter van der Putten Data Mining In Direct Marketing Databases , 1998 .

[18]  Yen-Liang Chen,et al.  Constructing a multi-valued and multi-labeled decision tree , 2003, Expert Syst. Appl..

[19]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[20]  Gediminas Adomavicius,et al.  Using Data Mining Methods to Build Customer Profiles , 2001, Computer.

[21]  Selwyn Piramuthu Feature Selection for Financial Credit-Risk Evaluation Decisions , 1999, INFORMS J. Comput..

[22]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[23]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[24]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[25]  Yoon Ho Cho,et al.  A personalized recommender system based on web usage mining and decision tree induction , 2002, Expert Syst. Appl..

[26]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[27]  Vasant Honavar,et al.  Learning Classifiers Using Hierarchically Structured Class Taxonomies , 2005, SARA.

[28]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .