Multi-objective Clustering Ensemble for Varying Number of Clusters

Clustering ensemble aims to obtain final clustering combining multiple diverse clustering solutions. It has already been established as an effective tool to yield a robust, accurate and stable consensus from the input clustering solutions. So far, a spectrum of approaches has already been proposed over the years to generate final ensemble from multiple solutions. One major drawback of most of the existing cluster ensemble approaches is that they require the final number of clusters as an input. In this paper, we propose a multi-objective optimization based algorithm for cluster ensemble problem that optimizes two objective functions simultaneously. The first objective is to maximize the overall similarity of the reference clustering solution to the input solutions, whereas the second objective is to minimize the standard deviation of the similarity values to avoid any bias. Moreover, in this proposed model, there is no need to supply the number of clusters a apriori to apply the algorithm, which is missing in most of the state-of-the-art approaches. The effectiveness of the proposed technique over the existing approaches is demonstrated by applying it on eight real-life datasets.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[5]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[7]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[8]  Ertunc Erdil,et al.  Obtaining better quality final clustering by merging a collection of clusterings , 2010, Bioinform..

[9]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[10]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2009, Stat. Anal. Data Min..

[12]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[13]  A. Mukhopadhyay,et al.  Clustering Ensemble: A Multiobjective Genetic Algorithm based Approach , 2013 .

[14]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[15]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[16]  Ana L. N. Fred,et al.  Evidence Accumulation Clustering Based on the K-Means Algorithm , 2002, SSPR/SPR.

[17]  Selim Mimaroglu,et al.  DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[20]  Dan A. Simovici,et al.  Finding Median Partitions Using Information-Theoretical-Based Genetic Algorithms , 2002, J. Univers. Comput. Sci..

[21]  Tahani Alqurashi,et al.  Clustering ensemble method , 2018, International Journal of Machine Learning and Cybernetics.

[22]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.

[23]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[25]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.