Cooperative clustering

Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields, where there is a need to learn the inherent grouping structure of data in an unsupervised manner. There are many clustering approaches proposed in the literature with different quality/complexity tradeoffs. Each clustering algorithm works on its domain space with no optimum solution for all datasets of different properties, sizes, structures, and distributions. In this paper, a novel cooperative clustering (CC) model is presented. It involves cooperation among multiple clustering techniques for the goal of increasing the homogeneity of objects within the clusters. The CC model is capable of handling datasets with different properties by developing two data structures, a histogram representation of the pair-wise similarities and a cooperative contingency graph. The two data structures are designed to find the matching sub-clusters between different clusterings and to obtain the final set of clusters through a coherent merging process. The cooperative model is consistent and scalable in terms of the number of adopted clustering approaches. Experimental results show that the cooperative clustering model outperforms the individual clustering algorithms over a number of gene expression and text documents datasets.

[1]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Mohamed S. Kamel,et al.  Cooperative Partitional-Divisive Clustering and Its Application in Gene Expression Analysis , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[3]  Mohamed A. Ismail,et al.  Multidimensional data clustering utilizing hybrid search strategies , 1989, Pattern Recognit..

[4]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5]  Jun Wang,et al.  Single point iterative weighted fuzzy C-means clustering algorithm for remote sensing image segmentation , 2009, Pattern Recognit..

[6]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[7]  Seungjin Choi,et al.  Clustering with r-regular graphs , 2009, Pattern Recognit..

[8]  Derek Greene,et al.  Efficient Ensemble Methods for Document Clustering , 2006 .

[9]  Miin-Shen Yang,et al.  Robust cluster validity indexes , 2009, Pattern Recognit..

[10]  P. N. Suganthan,et al.  Robust growing neural gas algorithm with application in cluster analysis , 2004, Neural Networks.

[11]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[12]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[13]  Mohamed S. Kamel,et al.  Enhanced bisecting k-means clustering using intermediate cooperation , 2009, Pattern Recognit..

[14]  A. Kai Qin,et al.  Enhanced neural gas network for prototype-based clustering , 2005, Pattern Recognit..

[15]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Shuting Xu,et al.  A Parallel Hybrid Web Document Clustering Algorithm and its Performance Study , 2004, The Journal of Supercomputing.

[20]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[22]  Liu Rui,et al.  Fuzzy c-Means Clustering Algorithm , 2008 .

[23]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[25]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[26]  Ming-Syan Chen,et al.  Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging , 2005, IEEE Trans. Knowl. Data Eng..

[27]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28]  Jim Z. C. Lai,et al.  A Fuzzy K-means Clustering Algorithm Using Cluster Center Displacement , 2009, J. Inf. Sci. Eng..

[29]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[30]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[31]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[32]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[33]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[34]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[35]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[36]  Mohamed S. Kamel,et al.  Collaborative Document Clustering , 2006, SDM.

[37]  Yuntao Qian,et al.  Clustering combination method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[38]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[39]  Sergio M. Savaresi,et al.  On the performance of bisecting K-means and PDDP , 2001, SDM.