Big Data Mining Based on Computational Intelligence and Fuzzy Clustering

The availability of a huge amount of heterogeneous data from different sources to the Internet has been termed as the problem of Big Data. Clustering is widely used as a knowledge discovery tool that separate the data into manageable parts. There is a need of clustering algorithms that scale on big databases. In this chapter we have explored various schemes that have been used to tackle the big databases. Statistical features have been extracted and most important and relevant features have been extracted from the given dataset. Reduce and irrelevant features have been eliminated and most important features have been selected by genetic algorithms (GA).Clustering with reduced feature sets requires lower computational time and resources. Experiments have been performed at standard datasets and results indicate that the proposed scheme based clustering offers high clustering accuracy. To check the clustering quality various quality measures have been computed and it has been observed that the proposed methodology results improved significantly. It has been observed that the proposed technique offers high quality clustering.

[1]  A. Kandaswamy,et al.  Experimental investigation on breast tissue classification based on statistical feature extraction of mammograms , 2007, Comput. Medical Imaging Graph..

[2]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[3]  Jin Young Kim,et al.  Image clustering using improved spatial fuzzy C-means , 2012, ICUIMC.

[4]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[5]  Jin Young Kim,et al.  Medical image segmentation employing information gain and fuzzy c-means algorithm , 2013, 2013 International Conference on Open Source Systems and Technologies.

[6]  Rong Jin,et al.  Speedup of fuzzy and possibilistic kernel c-means for large-scale clustering , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[7]  Zhiyong Peng,et al.  From Big Data to Big Data Mining: Challenges, Issues, and Opportunities , 2013, DASFAA Workshops.

[8]  James C. Bezdek,et al.  Extending fuzzy and probabilistic clustering to very large data sets , 2006, Comput. Stat. Data Anal..

[9]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[10]  Lawrence O. Hall,et al.  Single Pass Fuzzy C Means , 2007, 2007 IEEE International Fuzzy Systems Conference.

[11]  L.O. Hall,et al.  Online fuzzy c means , 2008, NAFIPS 2008 - 2008 Annual Meeting of the North American Fuzzy Information Processing Society.

[12]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Jin Young Kim,et al.  Automatic Active Contour-Based Segmentation and Classification of Carotid Artery Ultrasound Images , 2013, Journal of Digital Imaging.

[15]  Jin Young Kim,et al.  Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering , 2012, Comput. Methods Programs Biomed..

[16]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[17]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[18]  Judith Hurwitz,et al.  Big Data For Dummies , 2013 .

[19]  Jin Young Kim,et al.  Neuro fuzzy and punctual kriging based filter for image restoration , 2013, Appl. Soft Comput..

[20]  Asifullah Khan,et al.  Robust information gain based fuzzy c-means clustering and classification of carotid artery ultrasound images , 2014, Comput. Methods Programs Biomed..