Improving the quality of clustering using cluster ensembles

Clustering is a very widely used data mining task practiced to partition data in to similar groups/clusters. Data clustering can be challenging, when it has to be done for huge data sets as no single clustering algorithm proves to give optimal results. Clustering ensembles is an emerging solution to the above issue for improving robustness, stability and accuracy of unsupervised clustering. The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. One of the major problems in clustering ensembles is in deriving the appropriate consensus function. The focus of this project is to implement cluster ensemble algorithm to improve the accuracy and efficiency of clustering. The first method used in this work is Co-association method where pair wise comparison is made and weight factor decides the data labelling. Second method is Normalized Mutual information where information shared between two clusters are measured to aggregate the clusters. It's observed that there is 20% increase in accuracy when cluster ensemble method was used.

[1]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiaoli Z. Fern,et al.  Cluster Ensembles for High Dimensional Clustering: An Empirical Study , 2006 .

[3]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[5]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[6]  Xiaohua Hu,et al.  Cluster Ensemble and Its Applications in Gene Expression Analysis , 2004, APBC.

[7]  Cheng-Fa Tsai,et al.  A new data clustering approach for data mining in large databases , 2002, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02.

[8]  Zahoor Ali Khan,et al.  Semi-supervised Clustering Ensemble by Voting , 2012, ArXiv.

[9]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[10]  P. Viswanath,et al.  A Fast and Efficient Ensemble Clustering Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Kurt Hornik,et al.  An Ensemble Method for Clustering , 2003 .

[12]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[13]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[14]  Ioannis T. Christou,et al.  Coordination of Cluster Ensembles via Exact Methods , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).