Subspace Clustering of High Dimensional Data Using Differential Evolution

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

[1]  Laxman Sahoo,et al.  Subspace Clustering of High-Dimensional Data: An Evolutionary Approach , 2013, Appl. Comput. Intell. Soft Comput..

[2]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[3]  Durga Toshniwal,et al.  Projected Clustering Using Particle Swarm Optimization , 2012 .

[4]  S. Indu,et al.  Nature-Inspired Algorithms in Wireless Sensor Networks , 2019 .

[5]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[6]  Hema Banati,et al.  Context aware filtering using social behavior of frogs , 2014, Swarm Evol. Comput..

[7]  Ira Assent,et al.  Clustering high dimensional data , 2012 .

[8]  Carlos Soares,et al.  A Comparison of Ranking Methods for Classification Algorithm Selection , 2000, ECML.

[9]  Parul Agarwal,et al.  Comparative analysis of nature inspired algorithms on data clustering , 2015, 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[10]  Evangelos E. Milios,et al.  An evolutionary subspace clustering algorithm for high-dimensional data , 2012, GECCO '12.

[11]  Shengrui Wang,et al.  Particle swarm optimizer for variable weighting in clustering high-dimensional data , 2009, 2009 IEEE Swarm Intelligence Symposium.

[12]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Shikha Mehta,et al.  Nature-Inspired Algorithms: State-of-Art, Problems and Prospects , 2014 .

[14]  Shikha Mehta,et al.  Enhanced flower pollination algorithm on data clustering , 2016 .

[15]  Amitava Datta,et al.  SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[16]  Abdesselam Redouane Towards a Specification Language for Mobile Applications , 2013, Int. J. Softw. Sci. Comput. Intell..

[17]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[18]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[19]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[20]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[21]  Amitava Datta,et al.  A novel algorithm for fast and scalable subspace clustering of high-dimensional data , 2015, Journal of Big Data.

[22]  Ira Assent,et al.  INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Prabir Bhattacharya,et al.  Nuclei Segmentation for Quantification of Brain Tumors in Digital Pathology Images , 2018, Int. J. Softw. Sci. Comput. Intell..

[24]  Christian Böhm,et al.  Finding the Optimal Subspace for Clustering , 2014, 2014 IEEE International Conference on Data Mining.

[25]  Shikha Mehta,et al.  Empirical analysis of five nature-inspired algorithms on real parameter optimization problems , 2018, Artificial Intelligence Review.

[26]  Martin Ester,et al.  P3C: A Robust Projected Clustering Algorithm , 2006, Sixth International Conference on Data Mining (ICDM'06).

[27]  Ronnie Johansson,et al.  Choosing DBSCAN Parameters Automatically using Differential Evolution , 2014 .

[28]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[29]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[30]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[31]  D. Massart,et al.  Looking for natural patterns in data: Part 1. Density-based approach , 2001 .

[32]  Lin Lin,et al.  A hybrid EA for high-dimensional subspace clustering problem , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[33]  Jörg Sander,et al.  Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering , 2008, KDD.

[34]  V. Vaidehi,et al.  A Pattern-Mining Approach for Wearable Sensor-Based Remote Health Care , 2018 .

[35]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[36]  Marimuthu Palaniswami,et al.  A Hybrid Approach to Clustering in Big Data , 2016, IEEE Transactions on Cybernetics.

[37]  E. Ceulemans,et al.  Subspace K-means clustering , 2013, Behavior Research Methods.

[38]  Mohammed J. Zaki,et al.  SCHISM: a new approach for interesting subspace mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[39]  Hao-jun Sun,et al.  Genetic Algorithm-Based High-dimensional Data Clustering Technique , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.