Using Multiple Clustering Algorithms to Generate Constraint Rules and Create Consensus Clusters

Data clustering techniques is used for aiding knowledge discovery when no additional information is available. There are several clustering techniques which produce reasonable results, although they often produce qualitatively distinct clusterings. In this paper, we study how different clustering algorithms produce different kinds of clusters and their relations. Also, we evaluate the possibility to merge differently generated clustering into a new clustering which neither of original algorithms can produce. The main contribution of this paper is a new algorithm which merges previous generated clusterings based on must-link constraint rules built from agreements among elements observed from such clusterings. This novel approach employs the entropy of agreements in order to decide to which cluster should an element belong. Experimental results indicate: 1) our approach can merge characteristics from original clusterings; 2) in some situations, it captures new information from data and improve results, mainly when considering external perspective; and 3) in no situation it has produced significantly worse results.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[4]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[5]  M. Cugmas,et al.  On comparing partitions , 2015 .

[6]  G. Molenberghs,et al.  Topics in Modelling of Clustered Data , 2002 .

[7]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[8]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[9]  I-Cheng Yeh,et al.  Knowledge discovery on RFM model using Bernoulli sequence , 2009, Expert Syst. Appl..

[10]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[11]  Ian Davidson,et al.  Measuring Constraint-Set Utility for Partitional Clustering Algorithms , 2006, PKDD.

[12]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[13]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[14]  Stefan Siersdorfer,et al.  Restrictive clustering and metaclustering for self-organizing document collections , 2004, SIGIR '04.

[15]  James C. Bezdek,et al.  Efficient Implementation of the Fuzzy c-Means Clustering Algorithms , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[17]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[18]  Piotr A. Kowalski,et al.  Complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images , 2010 .

[19]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[20]  Ricardo J. G. B. Campello,et al.  Relative clustering validity criteria: A comparative overview , 2010 .

[21]  David R. Karger,et al.  Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections , 2017, SIGF.

[22]  Pawan Lingras,et al.  Recursive meta-clustering in a granular network , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[23]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[24]  Guang R. Gao,et al.  An adaptive meta-clustering approach: combining the information from different clustering results , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[25]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .