Different clustering algorithms for Big Data analytics: A review

The era of huge data is snowballing at frequent swiftness in size (volume) and in different formats (variety). This data which comes from various sources e.g. media, communication devices, internet, business etc. and there are many difficulties and challenges that one faces while handling it. Data mining is a process intended to reconnoiter analytical data (typically business or market associated data - also acknowledged as “Big data”). There are several data mining techniques such as outlier analysis, organization, clustering, prediction and association rule mining. In this paper we have discussed several applications and the importance of clustering. To examine the huge volume of data, clustering algorithms aid in providing a powerful meta-learning tool. Numerous clustering techniques (including traditional and the recently developed) in reference to large data sets with their pros & cons are being discussed in this paper.

[1]  Ameer Ahmed Abbasi,et al.  A survey on clustering algorithms for wireless sensor networks , 2007, Comput. Commun..

[2]  Ying Wah Teh,et al.  DMM-Stream: A Density Mini-Micro Clustering Algorithm for Evolving Data Streams , 2013, DaEng.

[3]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[4]  Guoyin Wang,et al.  An automatic method to determine the number of clusters using decision-theoretic rough set , 2014, Int. J. Approx. Reason..

[5]  Xue-Feng Jiang Application of Parallel Annealing Particle Clustering Algorithm in Data Mining , 2014 .

[6]  Raveendran Paramesran,et al.  A hybrid approach for data clustering based on modified cohort intelligence and K-means , 2014, Expert Syst. Appl..

[7]  Fang Meng,et al.  HGCUDF: Hierarchical Grid Clustering Using Data Field , 2014 .

[8]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[9]  Kamel Nadjet,et al.  A New Algorithm for Data Clustering Based on Cuckoo Search Optimization , 2014, ICGEC 2014.

[10]  Alexandros Nanopoulos,et al.  Storage-optimizing clustering algorithms for high-dimensional tick data , 2014, Expert Syst. Appl..

[11]  Fuzhen Zhuang,et al.  Clustering in extreme learning machine feature space , 2014, Neurocomputing.

[12]  Lazaros Mavridis,et al.  PFClust: an optimised implementation of a parameter-free clustering algorithm , 2013, Source Code for Biology and Medicine.

[13]  Fei Wang,et al.  Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism , 2013, IDEAL.

[14]  Lu Huang,et al.  A survey of mass data mining based on cloud-computing , 2012, Anti-counterfeiting, Security, and Identification.

[15]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Marko Grobelnik,et al.  A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES , 2005 .

[17]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.