Using Clustering to Improve Decision Trees Visualization

Decision trees are simple and powerful decision support tools, and their graphical nature can be very useful for visual analysis tasks. However, decision trees tend to be large and hard to display when they are built from complex real world data. This paper proposes an original solution to optimize the visual representation of decision trees obtained from data. The solution combines clustering and feature construction, and introduces a new clustering algorithm that takes into account the visual properties and the accuracy of decision trees. A prototype has been implemented, and the benefits of the proposed method are shown using the results of several experiments performed on the UCI datasets.

[1]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[2]  Boris Kovalerchuk,et al.  Inverse Visualization In Data Mining , 2002 .

[3]  E. Treiguts CONSTRUCTING NEW ATTRIBUTES FOR ALGORITHMS OF DECISION TREES INDUCTION , 2002 .

[4]  Jarke J. van Wijk,et al.  BaobabView: Interactive construction and analysis of decision trees , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[5]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[6]  Matej Novotny,et al.  Visually Effective Information Visualization of Large Data , 2004 .

[7]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[8]  Zijian Zheng Constructing New Attributes for Decision Tree Learning , 1996 .

[9]  Padraic Neville,et al.  Case study: visualization for decision tree analysis in data mining , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[10]  R. Quinlan,et al.  Decision tree discovery , 1999 .

[11]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[12]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[13]  P. Kokol,et al.  Comprehensive Decision Tree Models in Bioinformatics , 2012, PloS one.

[14]  Abdullah M. Al Ghoson Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, And IBM Intelligent Miner A Comparative Analysis , 2011, BIOINFORMATICS 2011.

[15]  Philip S. Yu,et al.  Clustering through decision tree construction , 2000, CIKM '00.

[16]  Fabrice Rossi,et al.  Hierarchical clustering for graph visualization , 2011, ESANN.

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[19]  Russell Greiner,et al.  A Fast Way to Produce Optimal Fixed-Depth Decision Trees , 2008, ISAIM.

[20]  Catherine B. Hurley,et al.  Clustering Visualizations of Multidimensional Data , 2004 .

[21]  Stefan Berchtold,et al.  Similarity clustering of dimensions for an enhanced visualization of multidimensional data , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[22]  Issei Fujishiro,et al.  The elements of graphing data , 2005, The Visual Computer.

[23]  Ivan Herman,et al.  Tree Visualisation and Navigation Clues for Information Visualisation , 1998, Comput. Graph. Forum.

[24]  Wei-Min Shen,et al.  Data Preprocessing and Intelligent Data Analysis , 1997, Intell. Data Anal..