Interactive Clustering in Distributed Environment

Due to the explosion in the number of autonomous data sources, there is a growing need for effective approaches to distributed knowledge discovery and interactive data mining. In this paper, distributed VISTA system is proposed by extending existing visual cluster rendering system for distributed environment. First, all objects of local datasets are grouped using VISTA system and resulting centroids are considered as local models. Then, local models are combined to form a global model using VISTA. Finally, global clusters are automatically identified using global models and corresponding objects are visually explored. The experiments are carried out for various datasets of UCI machine learning data repository.

[1]  Eser Kandogan,et al.  Visualizing multi-dimensional clusters, trends, and outliers using star coordinates , 2001, KDD '01.

[2]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Bhavani Thuraisingham,et al.  Data Mining: Technologies, Techniques, Tools, and Trends , 1998 .

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  O. Sourina,et al.  Visual interactive 3-dimensional clustering with implicit functions , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[6]  Keke Chen,et al.  VISTA: Validating and Refining Clusters Via Visualization , 2004, Inf. Vis..

[7]  Lawrence O. Hall,et al.  Scalable clustering: a distributed approach , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[8]  Keke Chen,et al.  iVIBRATE: Interactive visualization-based framework for clustering large datasets , 2006, ACM Trans. Inf. Syst..

[9]  Daniel A. Keim,et al.  HD-Eye: Visual Mining of High-Dimensional Data , 1999, IEEE Computer Graphics and Applications.

[10]  Marie desJardins,et al.  Interactive visual clustering , 2007, IUI '07.

[11]  Minyi Guo,et al.  A Scheme of Interactive Data Mining Support System in Parallel and Distributed Environment , 2003, ISPA.