Eureka!: an interactive and visual knowledge discovery tool

Visualization techniques may guide the data mining process since they provide effective support for data partitioning and visual inspection of results, especially when high dimensional data sets are considered. In this paper we describe Eureka!, an interactive, visual knowledge discovery tool for analyzing high dimensional numerical data sets. The tool combines a visual clustering method, to hypothesize meaningful structures in the data, and a classification machine learning algorithm, to validate the hypothesized structures. A two-dimensional representation of the available data allows users to partition the search space by choosing shape or density according to criteria they deem optimal. A partition can be composed by regions populated according to some arbitrary form, not necessarily spherical. The accuracy of clustering results can be validated by using different techniques (e.g. a decision tree classifier) included in the mining tool.

[1]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[2]  Hans-Peter Kriegel,et al.  Towards an effective cooperation of the user and the computer for classification , 2000, KDD '00.

[3]  I. Jolliffe Principal Component Analysis , 2002 .

[4]  Brian Everitt,et al.  Cluster analysis , 1974 .

[5]  Domenico Talia,et al.  Eureka! : A Tool for Interactive Knowledge Discovery , 2002, DEXA.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Andreas Wierse,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[8]  Gene H. Golub,et al.  Matrix computations , 1983 .

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Hans-Peter Kriegel,et al.  Visual classification: an interactive approach to decision tree construction , 1999, KDD '99.

[11]  Eser Kandogan,et al.  Visualizing multi-dimensional clusters, trends, and outliers using star coordinates , 2001, KDD '01.

[12]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[13]  Robert L. Grossman,et al.  Data Mining for Scientific and Engineering Applications , 2001, Massive Computing.

[14]  Heike Hofmann,et al.  Visualizing association rules with interactive mosaic plots , 2000, KDD '00.

[15]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[18]  Li Yang,et al.  Interactive exploration of very large relational datasets through 3D dynamic projections , 2000, KDD '00.

[19]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[20]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[21]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[22]  Maureen C. Stone,et al.  Enhanced dynamic queries via movable filters , 1995, CHI '95.

[23]  Christopher A. Badurek,et al.  Review of Information visualization in data mining and knowledge discovery by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse. Morgan Kaufmann 2002 , 2003 .

[24]  Domenico Talia,et al.  Mining High-Dimensional Scientific Data Sets Using Singular Value Decomposition , 2001 .

[25]  Daniel A. Keim,et al.  HD-Eye: Visual Mining of High-Dimensional Data , 1999, IEEE Computer Graphics and Applications.

[26]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[27]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  HalkidiMaria,et al.  Cluster validity methods , 2002 .

[30]  Matthew O. Ward,et al.  XmdvTool: integrating multiple methods for visualizing multivariate data , 1994, Proceedings Visualization '94.

[31]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[32]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[33]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.