Visualization support to better comprehend and improve decision tree classification modelling process: a survey and appraisal

Data mining (DM) can be defined as the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data. Modelling is the crucial step where DM algorithms are applied in order to extract data patterns. In order for domain experts, who play significant roles in DM process, to make the most efficient and effective use of DM tools, these tools must incorporate appropriate visualization to facilitate the process of modelling. Yet, unfortunately, study of how visualization should be designed, particularly what components should be included and how to present them, has been rather limited. This paper surveys the current state of art in application of visualization techniques to better comprehend and improve the decision trees modelling process in three modes: visualization of tree models, visualization of model evaluation and visual interactive tree construction. A number of issues that have been overlooked and areas that need to be improved are identified through reviewing a collection of related research and examining six current DM softwares in terms of their design of a few important features in each mode of the visualization support to decision trees classification modelling. Although this article focuses on decision trees classification modelling, guidelines derived from this study can be beneficial to other modelling techniques as well. At the end of the paper, a desirable design of visualization support to DM modelling is proposed with a conceptual model.

[1]  Ben Shneiderman,et al.  Ordered treemap layouts , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[2]  Jarke J. van Wijk,et al.  Beamtrees: compact visualization of large hierarchies , 2002, Inf. Vis..

[3]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[4]  Michael Friendly,et al.  A Fourfold Display for 2 by 2 by k Tables , 2008 .

[5]  Ben Shneiderman,et al.  Dynamic queries for visual information seeking , 1994, IEEE Software.

[6]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[7]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[8]  Allen G. Vartabedian The design of visual displays , 1970, CCRV.

[9]  Georges G. Grinstein,et al.  A survey of visualizations for high-dimensional data mining , 2001 .

[10]  James Allan,et al.  Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems , 1998, SIGIR '98.

[11]  Wesley M. Johnston,et al.  Model visualization , 2001 .

[12]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[13]  Beat Kleiner,et al.  Graphical Methods for Data Analysis , 1983 .

[14]  Dylgg,et al.  Guidelines for Designing Information Visualization Applications , 1999 .

[15]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[16]  Jean-Marc Adamo Constraint-Based Rule Mining , 2001 .

[17]  M. Friendly Mosaic Displays for Multi-Way Contingency Tables , 1994 .

[18]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[19]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[20]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[21]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[22]  François Poulet,et al.  Cooperation between automatic algorithms , interactive algorithms and visualization tools for Visual Data Mining , 2002 .

[23]  Georges G. Grinstein,et al.  Iconographic Displays For Visualizing Multidimensional Data , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[24]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[25]  Beat Kleiner,et al.  A Mosaic of Television Ratings , 1984 .

[26]  Hans-Peter Kriegel,et al.  Towards an effective cooperation of the user and the computer for classification , 2000, KDD '00.

[27]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[28]  Matthew O. Ward,et al.  A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization , 2002, Inf. Vis..

[29]  Ian Davidson,et al.  Visual Data Mining: Techniques and Tools for Data Visualization and Mining , 2002 .

[30]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[31]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[32]  Robert McGill,et al.  Graphical Perception: The Visual Decoding of Quantitative Information on Graphical Displays of Data , 1987 .

[33]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[34]  Hans-Peter Kriegel,et al.  'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets , 1996 .

[35]  H. Rex Hartson,et al.  Developing user interfaces: ensuring usability through product & process , 1993 .

[36]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[37]  Tova Avidan,et al.  ParallAX– A data mining tool based on parallel coordinates , 1999, Comput. Stat..

[38]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[39]  D. Crawford Introduction , 2008, CACM.

[40]  Jarke J. van Wijk,et al.  Cushion Treemaps: Visualization of Hierarchical Information , 1999, INFOVIS.

[41]  Nick Cercone,et al.  Interactive Construction of Decision Trees , 2001, PAKDD.

[42]  Sidney L. Smith,et al.  Guidelines for Designing User Interface Software , 1986 .

[43]  Dennis DeCoste,et al.  Visualizing data mining models , 2001 .

[44]  Heidrun Schumann,et al.  A Flexible Approach for Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[45]  William A. Wallace,et al.  Visualization and the process of modeling: a cognitive-theoretic view , 2000, KDD '00.

[46]  Issei Fujishiro,et al.  The elements of graphing data , 2005, The Visual Computer.

[47]  Daniel A. Keim,et al.  Hierarchical Pixel Bar Charts , 2002, IEEE Trans. Vis. Comput. Graph..

[48]  Ian H. Witten,et al.  Interactive machine learning: letting users build classifiers , 2002, Int. J. Hum. Comput. Stud..

[49]  Maurice Leatherbury Developing User Interfaces: Ensuring Usability Through Product & Process , 1995 .

[50]  Christopher A. Badurek,et al.  Review of Information visualization in data mining and knowledge discovery by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse. Morgan Kaufmann 2002 , 2003 .

[51]  Keke Chen,et al.  Validating and refining clusters via visual rendering , 2003, Third IEEE International Conference on Data Mining.

[52]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[53]  Hans-Peter Kriegel,et al.  Towards an Effective Cooperation of the Computer and the User for Classification , 2000, KDD 2000.

[54]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[55]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[56]  C. Marlin Brown,et al.  Human-Computer Interface Design Guidelines , 1998 .

[57]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[58]  Michael J. A. Berry,et al.  Mastering Data Mining: The Art and Science of Customer Relationship Management , 1999 .

[59]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[60]  Patty Curthoys,et al.  Developing user interfaces: Ensuring usability through product and process , 1997 .

[61]  K. Shadan,et al.  Available online: , 2012 .

[62]  Allison Woodruff,et al.  Guidelines for using multiple views in information visualization , 2000, AVI '00.

[63]  J. Hackman,et al.  Development of the Job Diagnostic Survey , 1975 .

[64]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[65]  Ganesh S. Oak Information Visualization Introduction , 2022 .

[66]  Hans-Peter Kriegel,et al.  Recursive pattern: a technique for visualizing very large amounts of data , 1995, Proceedings Visualization '95.

[67]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[68]  Tu Bao Ho,et al.  Visualization method and tool for interactive learning of large decision trees , 2002, SPIE Defense + Commercial Sensing.

[69]  Hans-Peter Kriegel,et al.  Supporting data mining of large databases by visual feedback queries , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[70]  Alfred Inselberg Visual Data Mining with Parallel Coordinates , 1998 .

[71]  Marc M. Sebrechts,et al.  Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces , 1999, SIGIR '99.

[72]  Padraic Neville,et al.  A comparison of 2-D visualizations of hierarchies , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[73]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[74]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[75]  Gregory B. Newby,et al.  Empirical Study of a 3D Visualization for Information Retrieval Tasks , 2004, Journal of Intelligent Information Systems.

[76]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[77]  Hans-Peter Kriegel,et al.  Visual classification: an interactive approach to decision tree construction , 1999, KDD '99.

[78]  Martin Theus,et al.  Interactive Data Visualization using Mondrian , 2002 .

[79]  Naomi B. Robbins,et al.  Creating More Effective Graphs , 2004 .