Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data

Many conventional soft subspace clustering techniques merge several criteria into a single objective to improve performance; however, the weighting parameters become important but difficult to set. In this paper, a novel soft subspace clustering with a multi-objective evolutionary approach (MOEASSC) is proposed to this problem. This clustering method considers two types of criteria as multiple objectives and optimizes them simultaneously by using a modified multi-objective evolutionary algorithm with new encoding and operators. An indicator called projection similarity validity index (PSVIndex) is designed to select the best solution and cluster number. Experiments on many datasets demonstrate the usefulness of MOEASSC and PSVIndex, and show that our algorithm is insensitive to its parameters and is scalable to large datasets.

[1]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[2]  Jun Du,et al.  Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering , 2006, Intell. Data Anal..

[3]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[4]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[5]  Henri Luchian,et al.  A unifying criterion for unsupervised clustering and feature selection , 2011, Pattern Recognit..

[6]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[7]  Chieh-Yuan Tsai,et al.  Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm , 2008, Comput. Stat. Data Anal..

[8]  Zhaohong Deng,et al.  Enhanced soft subspace clustering integrating within-cluster and between-cluster information , 2010, Pattern Recognit..

[9]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[10]  Michael K. Ng,et al.  Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm , 2005, PAKDD.

[11]  Xin Yao,et al.  An evolutionary clustering algorithm for gene expression microarray data analysis , 2006, IEEE Transactions on Evolutionary Computation.

[12]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[13]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[14]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[15]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[16]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[17]  Ajith Abraham,et al.  Data Clustering Using Multi-objective Differential Evolution Algorithms , 2009, Fundam. Informaticae.

[18]  D. S. Yeung,et al.  Improving Performance of Similarity-Based Clustering by Feature Weight Learning , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Daoqiang Zhang,et al.  Constraint Score: A new filter method for feature selection with pairwise constraints , 2008, Pattern Recognit..

[21]  Jae-Woo Chang,et al.  A new cell-based clustering method for large, high-dimensional data in data mining applications , 2002, SAC '02.

[22]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[23]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Zijiang Yang,et al.  A Fuzzy Subspace Algorithm for Clustering High Dimensional Data , 2006, ADMA.

[25]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[26]  Xizhao Wang,et al.  OFFSS: optimal fuzzy-valued feature subset selection , 2003, IEEE Trans. Fuzzy Syst..

[27]  Hichem Frigui,et al.  Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[28]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[29]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[31]  Joshua D. Knowles,et al.  Feature subset selection in unsupervised learning via multiobjective optimization , 2006 .

[32]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[33]  Yaguo Lei,et al.  New clustering algorithm-based fault diagnosis using compensation distance evaluation technique , 2008 .

[34]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[35]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[36]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[37]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[38]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[39]  Miin-Shen Yang,et al.  Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation , 2008, Pattern Recognit. Lett..

[40]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[41]  Myoung-Ho Kim,et al.  FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting , 2004, Inf. Softw. Technol..

[42]  Paul Scheunders,et al.  A comparison of clustering algorithms applied to color image quantization , 1997, Pattern Recognit. Lett..

[43]  Sanghamitra Bandyopadhyay,et al.  A new multiobjective clustering technique based on the concepts of stability and symmetry , 2010, Knowledge and Information Systems.

[44]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[46]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[47]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[48]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.