Measurement of similarity using link based cluster approach for categorical data

Clustering is to categorize data into groups or clusters such that the data in the same cluster are more similar to each other than to those in different clusters. The problem of clustering categorical data is to find a new partition in dataset to overcome the problem of clustering categorical data via cluster ensembles, result is observed that these techniques unluckily generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. This problem degrades the quality of the clustering result. To improve clustering quality a new link-based approach the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble and an efficient link-based algorithm is proposed for the underlying similarity assessment. In this paper propose C-Rank link-based algorithm improve clustering quality and ranking clusters in weighted networks. C-Rank consists of three major phases: (1) identification of candidate clusters; (2) ranking the candidates by integrated cohesion; and (3) elimination of non-maximal clusters. The finally apply this clustering result in graph partitioning technique is applied to a weighted bipartite graph that is formulated from the refined matrix.

[1]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[2]  Boris G. Mirkin,et al.  Reinterpreting the Category Utility Function , 2001, Machine Learning.

[3]  Zhang Yi,et al.  Clustering Categorical Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[4]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Zengyou He,et al.  A cluster ensemble method for clustering categorical data , 2005, Information Fusion.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.

[8]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[9]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  Daniel Barbará,et al.  Random Subspace Ensembles for Clustering Categorical Data , 2008 .

[12]  Arun K. Pujari,et al.  QROCK: A quick version of the ROCK algorithm for clustering of categorical data , 2005, Pattern Recognit. Lett..

[13]  Eman Abdu,et al.  A spectral-based clustering algorithm for categorical data using data summaries , 2009, DMMT '09.

[14]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[15]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.