Symbiotic evolutionary subspace clustering

New emerging high-dimensional data sets have made traditional clustering algorithms increasingly inefficient. More sophisticated approaches are required to cope with the increasing dimensionality and cardinality of such data sets. Feature selection methods are proposed as a solution to deal with this problem, however they fail for data sets where the attribute support for different clusters is not the same. For this category of data sets subspace clustering algorithms have been introduced over the past decade. We approach this problem from the perspective of Genetic Algorithms by adopting a hierarchical data structure deployed in three stages. 1) a traditional clustering algorithm is applied independently to each attribute of the data set, thus defining a grid of potential 1-d cluster centroids. 2) representing multi-dimensional cluster centroids by indexing 1-d cluster centroids. 3) converting the problem of finding the best combination of cluster centroids into that of discrete optimization and applying a multi-objective evolutionary algorithm, which uses group fitness evaluation to give a fitness to a group of clusters, as defined by process 2. Synthetic data sets with different characteristics are generated as the ground truth to evaluate the resulting algorithm for Evolutionary Subspace Clustering (ESC) as well as benchmark against alternative subspace and full-space clustering algorithms. ESC returns competitive accuracy and while typically utilizing less attributes and scaling as attribute count increases.

[1]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[2]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[3]  Man Lung Yiu,et al.  Frequent-pattern based iterative projected clustering , 2003, Third IEEE International Conference on Data Mining.

[4]  Ali M. S. Zalzala,et al.  Towards effective subspace clustering with an evolutionary algorithm , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[5]  Mikkel T. Jensen,et al.  Reducing the run-time complexity of multiobjective EAs: The NSGA-II and other algorithms , 2003, IEEE Trans. Evol. Comput..

[6]  Shengrui Wang,et al.  Particle swarm optimizer for variable weighting in clustering high-dimensional data , 2009, SIS.

[7]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[8]  Hans-Peter Kriegel,et al.  Subspace and projected clustering: experimental evaluation and analysis , 2009, Knowledge and Information Systems.

[9]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[10]  Shengrui Wang,et al.  Particle swarm optimizer for variable weighting in clustering high-dimensional data , 2009, 2009 IEEE Swarm Intelligence Symposium.

[11]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[12]  Malcolm I. Heywood,et al.  Bottom-up evolutionary subspace clustering , 2010, IEEE Congress on Evolutionary Computation.

[13]  Malcolm I. Heywood,et al.  Genetic optimization and hierarchical clustering applied to encrypted traffic identification , 2011, 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[14]  Malcolm I. Heywood,et al.  Symbiogenesis as a Mechanism for Building Complex Adaptive Systems: A Review , 2010, EvoApplications.

[15]  Marina Meila,et al.  Comparing subspace clusterings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[17]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[19]  Ebroul Izquierdo,et al.  Subspace clustering of images using Ant colony Optimisation , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[20]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[21]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[22]  Jason M. Daida,et al.  Symbionticism and Complex Adaptive Systems I: Implications of Having Symbiosis Occur in Nature , 1996, Evolutionary Programming.

[23]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[24]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.