Evolutionary multi-objective automatic clustering enhanced with quality metrics and ensemble strategy

Abstract Automatic clustering problem, which needs to detect the appropriate clustering without a pre-defined number of clusters (k), is difficult and challenging in unsupervised learning owing to the lack of prior domain knowledge. Despite a rising tendency with the application of evolutionary multi-objective optimization (EMO) techniques for automatic clustering, there still exist some obvious under-explored issues. In this paper, we resort to quality metrics and ensemble strategy for the sake of explicit/implicit knowledge discovery to guide the optimization process. The quality and diversity of solutions defined in terms of cluster validities, as similar to performance indicator for multi-objective optimization, are applied to assist in addressing automatic clustering problems and decreasing unnecessary computational overhead. To be specific, the main components like initialization, reproduction operations, and environmental selection which involved during EMO based automatic clustering are discussed and refined. For the determination of the final partitioning, quality metrics and cluster ensemble strategy are both considered to improve the retrieve system in the unsupervised way. Experiments are conducted from several different aspects and the corresponding analyses are provided, which confirm that the proposals are more efficient and effective for automatic clustering.

[1]  Jiye Liang,et al.  The $K$-Means-Type Algorithms Versus Imbalanced Data Distributions , 2012, IEEE Transactions on Fuzzy Systems.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Hamido Fujita,et al.  Multi-Imbalance: An open-source software for multi-class imbalance learning , 2019, Knowl. Based Syst..

[4]  Yimin Liu,et al.  Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data , 2014, Knowl. Based Syst..

[5]  Jian Zhuang,et al.  Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data , 2013, Pattern Recognit..

[6]  Chang-Dong Wang,et al.  Ultra-Scalable Spectral Clustering and Ensemble Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[7]  Dervis Karaboga,et al.  A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number , 2017, Swarm Evol. Comput..

[8]  Xudong Jiang,et al.  A multi-prototype clustering algorithm , 2009, Pattern Recognit..

[9]  Siripen Wikaisuksakul,et al.  A multi-objective genetic algorithm with fuzzy c-means for automatic data clustering , 2014, Appl. Soft Comput..

[10]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Darrell Whitley,et al.  NK Hybrid Genetic Algorithm for Clustering , 2018, IEEE Transactions on Evolutionary Computation.

[12]  Joshua D. Knowles,et al.  An Improved and More Scalable Evolutionary Approach to Multiobjective Clustering , 2018, IEEE Transactions on Evolutionary Computation.

[13]  Sahana D. Gowda,et al.  A novel validity index with dynamic cut-off for determining true clusters , 2015, Pattern Recognit..

[14]  Parham Moradi,et al.  A multi-objective particle swarm optimization algorithm for community detection in complex networks , 2017, Swarm Evol. Comput..

[15]  Yang Lu,et al.  Self-Adaptive Multiprototype-Based Competitive Learning Approach: A k-Means-Type Algorithm for Imbalanced Data Clustering , 2019, IEEE Transactions on Cybernetics.

[16]  Habib Fardoun,et al.  LEAC: An efficient library for clustering with evolutionary algorithms , 2019, Knowl. Based Syst..

[17]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[18]  Arthur Zimek,et al.  Density-Based Clustering Validation , 2014, SDM.

[19]  Ujjwal Maulik,et al.  Incremental learning based multiobjective fuzzy clustering for categorical data , 2014, Inf. Sci..

[20]  Ganapati Panda,et al.  Automatic clustering algorithm based on multi-objective Immunized PSO to classify actions of 3D human models , 2013, Eng. Appl. Artif. Intell..

[21]  Xuelong Li,et al.  Harmonious Genetic Clustering , 2018, IEEE Transactions on Cybernetics.

[22]  Nur Evin Özdemirel,et al.  Ant Colony Optimization based clustering methodology , 2015, Appl. Soft Comput..

[23]  Pintu Chandra Shill,et al.  New automatic fuzzy relational clustering algorithms using multi-objective NSGA-II , 2018, Inf. Sci..

[24]  Xingyi Zhang,et al.  A Mixed Representation-Based Multiobjective Evolutionary Algorithm for Overlapping Community Detection , 2017, IEEE Transactions on Cybernetics.

[25]  Hong Peng,et al.  Multiobjective fuzzy clustering approach based on tissue-like membrane systems , 2017, Knowl. Based Syst..

[26]  Kay Chen Tan,et al.  Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning , 2017, IEEE Transactions on Cybernetics.

[27]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[28]  Fang Liu,et al.  Learning simultaneous adaptive clustering and classification via MOEA , 2016, Pattern Recognit..

[29]  Hanqiang Liu,et al.  A multiobjective spatial fuzzy clustering algorithm for image segmentation , 2015, Appl. Soft Comput..

[30]  Nicandro Cruz-Ramírez,et al.  Improved multi-objective clustering with automatic determination of the number of clusters , 2016, Neural Computing and Applications.

[31]  Guangming Dai,et al.  Indicator and reference points co-guided evolutionary algorithm for many-objective optimization problems , 2018, Knowl. Based Syst..

[32]  Sriparna Saha,et al.  A generalized automatic clustering algorithm in a multiobjective framework , 2013, Appl. Soft Comput..

[33]  Mehmet Çunkas,et al.  Color image segmentation based on multiobjective artificial bee colony optimization , 2015, Appl. Soft Comput..

[34]  Zhiping Zhou,et al.  Kernel-based multiobjective clustering algorithm with automatic attribute weighting , 2018, Soft Comput..

[35]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Clustering , 2015, ACM Comput. Surv..

[36]  Maoguo Gong,et al.  Quantum-behaved discrete multi-objective particle swarm optimization for complex network clustering , 2017, Pattern Recognit..

[37]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[38]  Giuliano Armano,et al.  Multiobjective clustering analysis using particle swarm optimization , 2016, Expert Syst. Appl..

[39]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[40]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[41]  Lihong Xu,et al.  Many-objective fuzzy centroids clustering algorithm for categorical data , 2018, Expert Syst. Appl..

[42]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[43]  Sanghamitra Bandyopadhyay,et al.  Multiobjective Simulated Annealing for Fuzzy Clustering With Stability and Validity , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[44]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[45]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[46]  Weiguo Sheng,et al.  Adaptive Multisubpopulation Competition and Multiniche Crowding-Based Memetic Algorithm for Automatic Data Clustering , 2016, IEEE Transactions on Evolutionary Computation.

[47]  Alvaro Garcia-Piquer,et al.  Large-Scale Experimental Evaluation of Cluster Representations for Multiobjective Evolutionary Clustering , 2014, IEEE Transactions on Evolutionary Computation.

[48]  Alvaro Garcia-Piquer,et al.  Toward high performance solution retrieval in multiobjective clustering , 2015, Inf. Sci..

[49]  Wilfrido Gómez-Flores,et al.  Automatic clustering using nature-inspired metaheuristics: A survey , 2016, Appl. Soft Comput..

[50]  Liangpei Zhang,et al.  Automatic Fuzzy Clustering Based on Adaptive Multi-Objective Differential Evolution for Remote Sensing Imagery , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[51]  Matilde Santos Peñas,et al.  New internal index for clustering validation based on graphs , 2017, Expert Syst. Appl..

[52]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[53]  Tianrui Li,et al.  Nonnegative matrix factorization for clustering ensemble based on dark knowledge , 2019, Knowl. Based Syst..

[54]  Hamido Fujita,et al.  Hierarchical cluster ensemble model based on knowledge granulation , 2016, Knowl. Based Syst..

[55]  Xiangtao Li,et al.  Evolutionary Multiobjective Clustering and Its Applications to Patient Stratification , 2019, IEEE Transactions on Cybernetics.

[56]  Liangpei Zhang,et al.  Adaptive Multiobjective Memetic Fuzzy Clustering Algorithm for Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[57]  Hisao Ishibuchi,et al.  Multi-clustering via evolutionary multi-objective optimization , 2018, Inf. Sci..

[58]  Sriparna Saha,et al.  A multiobjective optimization based entity matching technique for bibliographic databases , 2016, Expert Syst. Appl..