A New Proposal for Tree Model Selection and Visualization

The most common approach to build a decision tree is based on a two-step procedure: growing a full tree and then prune it back. The goal is to identify the tree with the lowest error rate. Alternative pruning criteria have been proposed in literature. Within the framework of recursive partitioning algorithms by tree-based methods, this paper provides a contribution on both the visual representation of the data partition in a geometrical space and the selection of the decision tree. In our visual approach the identification of the best tree and of the weakest links is immediately evaluable by the graphical analysis of the tree structure without considering the pruning sequence. The results in terms of error rate are really similar to the ones returned by the classification and regression trees (CART) procedure, showing how this new way to select the best tree is a valid alternative to the well-known cost-complexity pruning.

[1]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[4]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[5]  C. Apte,et al.  Data mining with decision trees and decision rules , 1997, Future Gener. Comput. Syst..

[6]  Massimo Aria,et al.  Posterior Prediction Modelling of Optimal Trees , 2008 .

[7]  Roberta Siciliano,et al.  A fast splitting procedure for classification trees , 1997, Stat. Comput..

[8]  Christopher A. Badurek,et al.  Review of Information visualization in data mining and knowledge discovery by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse. Morgan Kaufmann 2002 , 2003 .

[9]  Roberta Siciliano,et al.  An Alternative Pruning Method Based on the Impurity-Complexity Measure , 1998, COMPSTAT.

[10]  R. C. Messenger,et al.  A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis , 1972 .

[11]  Hans-Peter Kriegel,et al.  'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets , 1996 .

[12]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  R. Siciliano,et al.  TWO-CLASS Trees for Non-Parametric Regression Analysis , 2011 .

[15]  Padraic Neville,et al.  A comparison of 2-D visualizations of hierarchies , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[16]  Hans-Peter Kriegel,et al.  Towards an effective cooperation of the user and the computer for classification , 2000, KDD '00.

[17]  Gavriel Salvendy,et al.  Design and evaluation of visualization support to facilitate decision trees classification , 2007, Int. J. Hum. Comput. Stud..

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .