Many-objective fuzzy centroids clustering algorithm for categorical data

Abstract Categorical data clustering algorithms, in contrast to numerical ones, are still in their infancy despite some algorithms have been proposed in the literature. It is known that many clustering algorithms are posed as optimization problems, where internal cluster validity functions are utilized as the objectives to find the optimal partitions. However, most of these methods consider a single criterion that can merely be applied to detect the particular structure/distribution of data. To overcome this issue, in this paper, a novel many objective fuzzy centroids clustering algorithms is proposed for categorical data using reference point based non-dominated sorting genetic algorithm, which simultaneously optimizes several cluster validity indices. In our work, an effective fuzzy centroids algorithm is employed to design the proposed approach, which is different from other contestant k-modes-type methods. Here, the fuzzy memberships are used for chromosome representation that combines with a novel genetic operation to produce new solutions. Moreover, a variable-length encoding scheme is developed for the sake of finding the clusters without knowing any prior knowledge. Experiments on several data sets demonstrate the superiority of the proposed algorithm over other state-of-the-art methods in terms of clustering accuracy and stability. On the other hand, our method can detect the cluster number if not predefined along with a desirable clustering solution.

[1]  Alvaro Garcia-Piquer,et al.  Scaling-up multiobjective evolutionary clustering algorithms using stratification , 2017, Pattern Recognit. Lett..

[2]  Zhiping Zhou,et al.  Kernel-based multiobjective clustering algorithm with automatic attribute weighting , 2018, Soft Comput..

[3]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[4]  Jiye Liang,et al.  Clustering ensemble selection for categorical data based on internal validity indices , 2017, Pattern Recognit..

[5]  Liang Bai,et al.  A dissimilarity measure for the k-Modes clustering algorithm , 2012, Knowl. Based Syst..

[6]  Jiye Liang,et al.  Space Structure and Clustering of Categorical Data , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Hong Jia,et al.  Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number , 2013, Pattern Recognit..

[8]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[9]  Michael K. Ng,et al.  On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Dervis Karaboga,et al.  A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number , 2017, Swarm Evol. Comput..

[11]  Doheon Lee,et al.  Fuzzy clustering of categorical data using fuzzy centroids , 2004, Pattern Recognit. Lett..

[12]  Jiye Liang,et al.  A cluster centers initialization method for clustering categorical data , 2012, Expert Syst. Appl..

[13]  Hong Jia,et al.  A New Distance Metric for Unsupervised Learning of Categorical Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-modes clustering , 2013, Expert Syst. Appl..

[15]  Yanfang Ye,et al.  Cluster Validation Method for Determining the Number of Clusters in Categorical Sequences , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part I , 2014, IEEE Transactions on Evolutionary Computation.

[17]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Renato Cordeiro de Amorim,et al.  Applying subclustering and Lp distance in Weighted K-Means with distributed centroids , 2016, Neurocomputing.

[19]  Xin Yao,et al.  Many-Objective Evolutionary Algorithms , 2015, ACM Comput. Surv..

[20]  Jiye Liang,et al.  A novel fuzzy clustering algorithm with between-cluster information for categorical data , 2013, Fuzzy Sets Syst..

[21]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[22]  Jiye Liang,et al.  The k-modes type clustering plus between-cluster information for categorical data , 2014, Neurocomputing.

[23]  Chih-Hung Wu,et al.  A New Fuzzy Clustering Validity Index With a Median Factor for Centroid-Based Clustering , 2015, IEEE Transactions on Fuzzy Systems.

[24]  J. Wu,et al.  A genetic fuzzy k-Modes algorithm for clustering categorical data , 2009, Expert Syst. Appl..

[25]  Miin-Shen Yang,et al.  A fuzzy k-partitions model for categorical data and its comparison to the GoM model , 2008, Fuzzy Sets Syst..

[26]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Chunguang Zhou,et al.  An improved k-prototypes clustering algorithm for mixed numeric and categorical data , 2013, Neurocomputing.

[28]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[29]  Giuliano Armano,et al.  Multiobjective clustering analysis using particle swarm optimization , 2016, Expert Syst. Appl..

[30]  Iwan Tri Riyadi Yanto,et al.  A modified Fuzzy k-Partition based on indiscernibility relation for categorical data clustering , 2016, Eng. Appl. Artif. Intell..

[31]  Weiguo Sheng,et al.  Adaptive Multisubpopulation Competition and Multiniche Crowding-Based Memetic Algorithm for Automatic Data Clustering , 2016, IEEE Transactions on Evolutionary Computation.

[32]  Chia-Hui Chang,et al.  Categorical Data Visualization and Clustering Using Subjective Factors , 2004, DaWaK.

[33]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[34]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[35]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[36]  Zengyou He,et al.  Attribute value weighting in k-modes clustering , 2011, Expert Syst. Appl..

[37]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[38]  Ujjwal Maulik,et al.  Incremental learning based multiobjective fuzzy clustering for categorical data , 2014, Inf. Sci..

[39]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[40]  Ujjwal Maulik,et al.  Integrating Clustering and Supervised Learning for Categorical Data Analysis , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[41]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[42]  Witold Pedrycz,et al.  Rough subspace-based clustering ensemble for categorical data , 2013, Soft Comput..

[43]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[44]  Adnan Shaout,et al.  Many-Objective Software Remodularization Using NSGA-III , 2015, TSEM.

[45]  Xiao Han,et al.  A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data , 2012, Knowl. Based Syst..

[46]  Nur Evin Özdemirel,et al.  Ant Colony Optimization based clustering methodology , 2015, Appl. Soft Comput..

[47]  Joshua D. Knowles,et al.  An Improved and More Scalable Evolutionary Approach to Multiobjective Clustering , 2018, IEEE Transactions on Evolutionary Computation.

[48]  Amir Ahmad,et al.  K-Harmonic means type clustering algorithm for mixed datasets , 2016, Appl. Soft Comput..

[49]  Feng Jiang,et al.  Initialization of K-modes clustering using outlier detection techniques , 2016, Inf. Sci..

[50]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[51]  M. Tahar Kechadi,et al.  A multi-act sequential game-based multi-objective clustering approach for categorical data , 2017, Neurocomputing.

[52]  Siripen Wikaisuksakul,et al.  A multi-objective genetic algorithm with fuzzy c-means for automatic data clustering , 2014, Appl. Soft Comput..

[53]  Babak Rezaee,et al.  A cluster validity index for fuzzy clustering , 2010, Fuzzy Sets Syst..

[54]  Jiye Liang,et al.  A new initialization method for categorical data clustering , 2009, Expert Syst. Appl..

[55]  Wilfrido Gómez-Flores,et al.  Automatic clustering using nature-inspired metaheuristics: A survey , 2016, Appl. Soft Comput..

[56]  Sriparna Saha,et al.  A multiobjective optimization based entity matching technique for bibliographic databases , 2016, Expert Syst. Appl..

[57]  Sahana D. Gowda,et al.  A novel validity index with dynamic cut-off for determining true clusters , 2015, Pattern Recognit..

[58]  R. J. Kuo,et al.  Non-dominated sorting genetic algorithm using fuzzy membership chromosome for categorical data clustering , 2015, Appl. Soft Comput..

[59]  Markus Olhofer,et al.  Evolutionary Many-Objective Optimization of Hybrid Electric Vehicle Control: From General Optimization to Preference Articulation , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[60]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..