Visualization for enhancing the data mining process

Visualization has proved to be a suitable paradigm for the analysis and exploration of datasets. In the data mining cycle, visualization has been mainly focused on data visualization and output generation. However, besides datasets, many other entities need to be explored and understood by users and analysts. In this paper, we describe the role of visualization in the data mining process, and we present a model to support the interaction between users and data mining entities. We discuss visualizations of datasets, parameter spaces of data mining algorithms, models induced from datasets, and patterns generated by the application of data mining algorithms to datasets. We have developed a Java-based testbed, that implements the extended data mining model with visual support to interact with datasets, models, parameter spaces, and patterns. Experimental results based on several public datasets, data mining algorithms, multidimensional visualization techniques, and other novel visualizations, show clearly the benefits of the integration of visualization in the data mining process.

[1]  Georges G. Grinstein,et al.  Table visualizations: a formal model and its applications , 2000 .

[2]  Ron Kohavi,et al.  Visualizing the Simple Bayesian Classi er , 1997 .

[3]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[4]  Claudio J. Meneses,et al.  Categorization And Evaluation Of Data MiningTechniques , 1970 .

[5]  Georges G. Grinstein Harnessing the Human in Knowledge Discovery , 1996, KDD.

[6]  Georges G. Grinstein,et al.  Iconographic Displays For Visualizing Multidimensional Data , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[7]  Daniel A. Keim,et al.  Visual Techniques for Exploring Databases , 1997, KDD 1997.

[8]  David W. Aha,et al.  Learning Representative Exemplars of Concepts: An Initial Case Study , 1987 .

[9]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[11]  George H. John Enhancements to the data mining process , 1997 .

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[15]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[16]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.