论文信息 - Combining Parallel Self-Organizing Maps and K-Means to Cluster Distributed Data

Combining Parallel Self-Organizing Maps and K-Means to Cluster Distributed Data

Clustering is the process of discovering groups within multidimensional data, based on similarities, with a minimal knowledge of their structure. In previous works, we presented an algorithm (partSOM) to cluster distributed datasets, based on self-organizing maps (SOM). This work extends this approach presenting a strategy for efficient cluster analysis in distributed databases using SOM and K-means. The proposed strategy applies SOM algorithm separately in each distributed dataset, relative to database vertical partitions, to obtain a representative subset of each local dataset. In the sequence, these representative subsets are sent to a central site, which performs a fusion of the partial results and applies SOM and K-means algorithms to obtain a final result. Experimental results are compared with traditional SOM and partSOM approaches for different datasets.

F.L. Gorgonio | J. Costa | J. Costa | F. Gorgônio

[1] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[2] Teuvo Kohonen,et al. Self-Organizing Maps , 2010 .

[3] José Alfredo Ferreira Costa,et al. Parallel self-organizing maps with application in clustering distributed data , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[4] Xiaofeng Zhang,et al. Mining Local Data Sources For Learning Global Cluster Models , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[5] Chris Clifton,et al. Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[6] José Alfredo Ferreira Costa,et al. Clustering of complex shaped data sets via Kohonen maps and mathematical morphology , 2001, Data Mining and Knowledge Discovery: Theory, Tools, and Technology.

[7] Mehmed Kantardzic,et al. Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[8] Pavel Berkhin,et al. A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[9] Hillol Kargupta,et al. Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[10] Rebecca N. Wright,et al. A New Privacy-Preserving Distributed k-Clustering Algorithm , 2006, SDM.

[11] Chris Clifton,et al. Privacy-Preserving Kth Element Score over Vertically Partitioned Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[12] Bin Zhang,et al. Distributed data clustering can be efficient and exact , 2000, SKDD.

[13] Alfred Ultsch,et al. Knowledge Extraction from Self-Organizing Neural Networks , 1993 .

[14] Osmar R. Zaïane,et al. Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[15] Zengyou He,et al. Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach , 2005, ArXiv.

[16] Rui Xu,et al. Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.