DENCLUE-DE: Differential Evolution Based DENCLUE for Scalable Clustering in Big Data Analysis

In data analysis, clustering is one of the important tasks. In this context many clustering methods are proposed in literature for big data analysis. Density based clustering (DENCLUE) is one of the powerful unsupervised clustering methods for the huge volume of data sets. In denclue, hill climbing plays important role to find the density attractor. In this paper, we apply Differential evolutionary algorithm in the place of hill climbing to find the global optimum solution. In this model, we propose Gaussian based mutation function in DE to improve the accuracy and execution time on spark platform. We test this approach on big data sets presented in literature. Experimental results shows that the proposed approach outperforms other variants in terms of execution time.

[1]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[2]  Hajar Rehioui,et al.  The 7 th International Conference on Ambient Systems , Networks and Technologies ( ANT 2016 ) DENCLUE-IM : A New Approach for Big Data Clustering , 2016 .

[3]  Ian F. C. Smith,et al.  A Bounded Index for Cluster Validity , 2007, MLDM.

[4]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[5]  Dharavath Ramesh,et al.  DEBC-GM: Denclue Based Gaussian Mixture Approach for Big Data Clustering , 2018, 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT).

[6]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[7]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[8]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[9]  Emrah Hancer,et al.  Differential evolution for feature selection: a fuzzy wrapper–filter approach , 2018, Soft Comput..

[10]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[11]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[12]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[15]  Hajar Rehioui,et al.  An improvement of DENCLUE algorithm for the data clustering , 2015, 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA).

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  Rami N. Khushaba,et al.  Feature subset selection using differential evolution and a wheel based search strategy , 2013, Swarm Evol. Comput..

[18]  Chris H. Q. Ding,et al.  Cluster merging and splitting in hierarchical clustering algorithms , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..