An efficient and robust combined clustering technique for mining in large spatial databases

Mining knowledge from large amounts of spatial data is known as spatial data mining. It becomes a highly demanding field because huge amounts of spatial data have been collected in various applications ranging from geo-spatial data to bio-medical knowledge. The amount of spatial data being collected is increasing exponentially. So, it far exceeded human's ability to analyze. Recently, clustering has been recognized as a primary data mining method for knowledge discovery in spatial database. The database can be clustered in many ways depending on the clustering algorithm employed, parameter settings used, and other factors. Multiple clusterings can be combined so that the final partitioning of data provides better clustering. Applying cluster combinations by using neural networks can yield dramatic improvements in generalization performance. Another problem with most clustering algorithms is that the user must input the desired number of clusters. Quite often the optimal number of clusters is not known prior to execution. The main objective of this paper is to propose an efficient robust combined clustering technique using neural networks for large image databases that does not require a priori knowledge of the proper number of clusters. It only requires the user to provide a maximum number of clusters. Results on real databases are given to show that the proposed robust combined clustering technique can (i) improve quality and robustness, and (ii) enable distributed clustering.

[1]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[3]  Bruce W. Schmeiser,et al.  Optimal linear combinations of neural networks: an overview , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[4]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[5]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Guang R. Gao,et al.  An adaptive meta-clustering approach: combining the information from different clustering results , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[10]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[12]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[13]  Grigorios Tsoumakas,et al.  Distributed Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[14]  David West,et al.  A comparison of SOM neural network and hierarchical clustering methods , 1996 .

[15]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[16]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[17]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[18]  Bernard Toursel,et al.  Distributed Data Mining , 2001, Scalable Comput. Pract. Exp..

[19]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[20]  Philip K. Chan,et al.  Systems for Knowledge Discovery in Databases , 1993, IEEE Trans. Knowl. Data Eng..

[21]  Jerome H. Friedman,et al.  DATA MINING AND STATISTICS: WHAT''S THE CONNECTION , 1997 .

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Walid G. Aref,et al.  Spatial Data Models and Query Processing , 1995, Modern Database Systems.

[24]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[25]  Beng Chin Ooi,et al.  Discovery of General Knowledge in Large Spatial Databases , 1993 .

[26]  Joachim M. Buhmann,et al.  Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  James C. Bezdek,et al.  Validity-guided (re)clustering with applications to image segmentation , 1996, IEEE Trans. Fuzzy Syst..

[28]  N. Boujemaa Generalized competitive clustering for image segmentation , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[29]  Zhou Zhi,et al.  Neural Network Ensemble , 2002 .

[30]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[31]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[32]  M. V. Velzen,et al.  Self-organizing maps , 2007 .