A survey of clustering techniques for big data analysis

With the beginning of new era data has grown rapidly not only in size but also in variety. There is a difficulty in analyzing such big data. Data mining is the technique in which useful information and hidden relationship among data is extracted. The traditional data mining approaches could not be directly implanted on big data as it faces difficulties to analyze big data. Clustering is one of the major techniques used for data mining in which mining is performed by finding out clusters having similar group of data. In this paper we have discussed some of the current big data mining clustering techniques. Comprehensive analysis of these techniques is carried out and appropriate clustering algorithm is provided.

[1]  Ying Wah Teh,et al.  DMM-Stream: A Density Mini-Micro Clustering Algorithm for Evolving Data Streams , 2013, DaEng.

[2]  M. Narasimha Murty,et al.  A ranking-based algorithm for detection of outliers in categorical data , 2014, Int. J. Hybrid Intell. Syst..

[3]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[4]  Guoyin Wang,et al.  An automatic method to determine the number of clusters using decision-theoretic rough set , 2014, Int. J. Approx. Reason..

[5]  Younghoon Kim,et al.  DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce , 2014, Inf. Syst..

[6]  Raveendran Paramesran,et al.  A hybrid approach for data clustering based on modified cohort intelligence and K-means , 2014, Expert Syst. Appl..

[7]  Chun-Wei Tsai,et al.  A Novel Spiral Optimization for Clustering , 2013, MUSIC.

[8]  Peng Jiang,et al.  A Clustering Approach to Constrained Binary Matrix Factorization , 2014 .

[9]  Fei Wang,et al.  Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism , 2013, IDEAL.

[10]  Nadjet Kamel,et al.  A New Algorithm for Data Clustering Based on Cuckoo Search Optimization , 2013, ICGEC.

[11]  Alexandros Nanopoulos,et al.  Storage-optimizing clustering algorithms for high-dimensional tick data , 2014, Expert Syst. Appl..

[12]  Lu Huang,et al.  A survey of mass data mining based on cloud-computing , 2012, Anti-counterfeiting, Security, and Identification.

[13]  P. V. G. D. Prasad Reddy,et al.  Cluster Analysis on Different Data Sets Using K-Modes and K-Prototype Algorithms , 2014 .

[14]  Winston H. Hsu,et al.  Online image search result grouping with MapReduce-based image clustering and graph construction for large-scale photos , 2014, J. Vis. Commun. Image Represent..

[15]  Iftekhar Naim,et al.  SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[16]  Fang Meng,et al.  HGCUDF: Hierarchical Grid Clustering Using Data Field , 2014 .

[17]  Xue-Feng Jiang Application of Parallel Annealing Particle Clustering Algorithm in Data Mining , 2014 .

[18]  Lazaros Mavridis,et al.  PFClust: an optimised implementation of a parameter-free clustering algorithm , 2013, Source Code for Biology and Medicine.

[19]  Fuzhen Zhuang,et al.  Clustering in extreme learning machine feature space , 2014, Neurocomputing.

[20]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.