On pruning the search space for clustering ensemble problems

Clustering ensemble has become a very popular technique in the past few years due to its potentialities for improving the clustering results. Roughly speaking it consists in the combination of different partitions of the same set of objects in order to obtain a consensus one. A common way of defining the consensus partition is as the solution of the median partition problem. In this way, the consensus partition is defined as the solution of a complex optimization problem. In this paper, we study possible prunes of the search space for this optimization problem. Particularly, we introduce a new prune that allows a dramatic reduction of the search space. We also provide a characterization of the family of dissimilarity measures that can be used to take advantage of this prune and we present two measures that fit into this family. We carry out an experimental study on synthetic data by comparing, under different circumstances, the size of the original search space and the size after the proposed prunes. Outstanding reductions are obtained, which can be beneficial for the development of clustering ensemble algorithms. We also compare, on real data, the behavior of a simulated annealing-based ensemble algorithm in the original partition space and in the two proposed pruned spaces. In all cases, the proposed prunes allow the algorithm to find solutions closer to the theoretical optimum.

[1]  Olivier Hudry,et al.  NP-hardness results for the aggregation of linear orders into median orders , 2008, Ann. Oper. Res..

[2]  Sandro Vega-Pons,et al.  Weighted association based methods for the combination of heterogeneous partitions , 2011, Pattern Recognit. Lett..

[3]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[4]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[5]  Sandro Vega-Pons,et al.  Weighted partition consensus via kernels , 2010, Pattern Recognit..

[6]  Vikas Singh,et al.  Ensemble clustering using semidefinite programming with applications , 2010, Machine Learning.

[7]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[8]  Bruno Leclerc,et al.  Lattice valuations, medians and majorities , 1993, Discret. Math..

[9]  D. W. Bunn,et al.  Group Choice , 1980 .

[10]  Xiaoyi Jiang,et al.  Generalized median string computation by means of string embedding in vector spaces , 2012, Pattern Recognit. Lett..

[11]  Michael Z. Spivey A Generalized Recurrence for Bell Numbers , 2008 .

[12]  Kagan Tumer,et al.  Ensemble clustering with voting active clusters , 2008, Pattern Recognit. Lett..

[13]  Bi-Ru Dai,et al.  A fragment-based iterative consensus clustering algorithm with a robust similarity , 2013, Knowledge and Information Systems.

[14]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[15]  Xi Wang,et al.  Clustering aggregation by probability accumulation , 2009, Pattern Recognit..

[16]  J. Astola,et al.  Vector median filters , 1990, Proc. IEEE.

[17]  T. Marchant,et al.  Separability and aggregation of equivalence relations , 2011, Economic Theory.

[18]  Hui-lan Luo,et al.  Combining Multiple Clusterings using Information Theory based Genetic Algorithm , 2006, 2006 International Conference on Computational Intelligence and Security.

[19]  Sandro Vega-Pons,et al.  Weighted Cluster Ensemble Using a Kernel Consensus Function , 2008, CIARP.

[20]  Ernest Valveny,et al.  Generalized median graph computation by means of graph embedding in vector spaces , 2010, Pattern Recognit..

[21]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[23]  Bernard Monjardet,et al.  Metrics on partially ordered sets - A survey , 1981, Discret. Math..

[24]  Mirko Krivánek,et al.  NP-hard problems in hierarchical-tree clustering , 1986, Acta Informatica.

[25]  Bruno Leclerc,et al.  The Median Procedure in the Semilattice of Orders , 2003, Discret. Appl. Math..

[26]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  S. Régnier,et al.  Sur quelques aspects mathématiques des problèmes de classification automatique , 1983 .

[28]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[29]  Sang-Ho Lee,et al.  Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results , 2006, BioDM.

[30]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[31]  Joydeep Ghosh,et al.  CONSENSUS-BASED ENSEMBLES OF SOFT CLUSTERINGS , 2008, MLMTA.

[32]  Bing Li,et al.  Efficient Clustering Aggregation Based on Data Fragments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.

[34]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[35]  Steven Skiena,et al.  Integrating microarray data by consensus clustering , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[36]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[37]  Xiaoyi Jiang,et al.  Segmentation Ensemble via Kernels , 2011, The First Asian Conference on Pattern Recognition.

[38]  Horst Bunke,et al.  Learning by generalized median concept , 2010 .

[39]  Abdolreza Mirzaei,et al.  A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations , 2010, IEEE Transactions on Fuzzy Systems.

[40]  Yoshiko Wakabayashi,et al.  Aggregation of binary relations: algorithmic and polyhedral investigations , 1986 .

[41]  M. Cugmas,et al.  On comparing partitions , 2015 .

[42]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[43]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.