MapReduce framework based big data clustering using fractional integrated sparse fuzzy C means algorithm

Big data analytics gain significant interest over the traditional data-processing methodologies that engage in extracting the hidden patterns and correlations from the massive data, termed as big data. With the aim of relieving the computational complexity the clustering method plays a significant role. With the knowledge of the clustering algorithms, the big data arriving from the distributed sources is processed using the MapReduce framework (MRF). The MRF possesses two functions, namely, map function and reduce function, such that the map function is based on the proposed Fractional Sparse Fuzzy C-Means (FrSparse FCM) algorithm and reduce function is based on particle swarm optimisation-based whale optimisation algorithm (P-Whale). Initially, the optimal centroids are computed using the proposed algorithm in the mapper phase that is optimally tuned in the reducer phase, and it is clear that the proposed FrSparse FCM-based MRF ensures the parallel processing of the big data. Experimentation is performed using the Skin data set and the localisation data set taken from the UCI machine learning repository, and the analysis is progressed using the metrics, such as accuracy and DB Index. The analysis proves that the proposed method acquired a maximum accuracy of 90.6012% and a minimum DB Index of 5.33.

[1]  A. Rezaee Jordehi A chaotic artificial immune system optimisation algorithm for solving global continuous optimisation problems , 2014, Neural Computing and Applications.

[2]  Georgios B. Giannakis,et al.  Sketch and Validate for Big Data Clustering , 2015, IEEE Journal of Selected Topics in Signal Processing.

[3]  Andrew Lewis,et al.  The Whale Optimization Algorithm , 2016, Adv. Eng. Softw..

[4]  Lei Liu,et al.  Particle swarm optimization algorithm: an overview , 2017, Soft Computing.

[5]  A. Rezaee Jordehi,et al.  An efficient chaotic water cycle algorithm for optimization tasks , 2015, Neural Computing and Applications.

[6]  A. Rezaee Jordehi,et al.  Enhanced leader PSO (ELPSO): A new PSO variant for solving global optimisation problems , 2015, Appl. Soft Comput..

[7]  Dantong Ouyang,et al.  An artificial bee colony approach for clustering , 2010, Expert Syst. Appl..

[8]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Ali Taylan Cemgil,et al.  Link prediction in heterogeneous data via generalized coupled tensor factorization , 2013, Data Mining and Knowledge Discovery.

[10]  W. Welch Algorithmic complexity: three NP- hard problems in computational statistics , 1982 .

[11]  Parham Moradi,et al.  An unsupervised feature selection algorithm based on ant colony optimization , 2014, Eng. Appl. Artif. Intell..

[12]  A. Rezaee Jordehi,et al.  Brainstorm optimisation algorithm (BSOA): An efficient algorithm for finding optimal location and setting of FACTS devices in electric power systems , 2015 .

[13]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[14]  Xiangyu Chang,et al.  Sparse Regularization in Fuzzy $c$ -Means for High-Dimensional Data Clustering , 2017, IEEE Transactions on Cybernetics.

[15]  Kayvan Bijari,et al.  Memory-enriched big bang–big crunch optimization algorithm for data clustering , 2017, Neural Computing and Applications.

[16]  Laurence T. Yang,et al.  Deep Computation Model for Unsupervised Feature Learning on Big Data , 2016, IEEE Transactions on Services Computing.

[17]  Yanchun Zhang,et al.  Fuzzy Consensus Clustering With Applications on Big Data , 2017, IEEE Transactions on Fuzzy Systems.

[18]  T. Soni Madhulatha,et al.  An Overview on Clustering Methods , 2012, ArXiv.

[19]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[20]  Lina Hao,et al.  Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data , 2018, Cluster Computing.

[21]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[22]  Le Hoang Son,et al.  Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering , 2017, Int. J. Fuzzy Syst..

[23]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[24]  L. Zadeh,et al.  An editorial perspective , 1978 .

[25]  Tunchan Cura,et al.  A particle swarm optimization approach to clustering , 2012, Expert Syst. Appl..

[26]  Laurence T. Yang,et al.  Secure weighted possibilistic c-means algorithm on cloud for clustering big data , 2018, Inf. Sci..

[27]  José Cristóbal Riquelme Santos,et al.  An approach to validity indices for clustering techniques in Big Data , 2018, Progress in Artificial Intelligence.

[28]  Keqiu Li,et al.  Optimized big data K-means clustering using MapReduce , 2014, The Journal of Supercomputing.

[29]  Hajar Rehioui,et al.  The 7 th International Conference on Ambient Systems , Networks and Technologies ( ANT 2016 ) DENCLUE-IM : A New Approach for Big Data Clustering , 2016 .

[30]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[31]  Erwie Zahara,et al.  A hybridized approach to data clustering , 2008, Expert Syst. Appl..

[32]  P. Subbulakshmi,et al.  Optimization using Artificial Bee Colony based clustering approach for big data , 2018, Cluster Computing.

[33]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[34]  Tongke Fan,et al.  Research and implementation of user clustering based on MapReduce in multimedia big data , 2018, Multimedia Tools and Applications.