REDUCTION OF BIG DATA SETS USING FUZZY CLUSTERING

Big Data comprises of large volume, growing data sets from multiple sources. The fundamental requirement is to extract useful information by exploring large volume of data. A preprocessing step of clustering is used to divide data into manageable parts. Fuzzy Clustering adds flexibility for clustering very large datasets in which each object can have membership in more than one cluster. The Incremental Weighted Fuzzy C-Means(IWFCM) introduce weight that describes the importance of each object in the clusters .IWFCM produces cluster with minimum run time and with high quality. The e-book dataset is executed over the Hadoop environment which executes over map reduce framework and data is reduced using IWFCM.

[1]  Lawrence O. Hall,et al.  Convergence of the Single-Pass and Online Fuzzy C-Means Algorithms , 2011, IEEE Transactions on Fuzzy Systems.

[2]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[3]  Jonathan Levin,et al.  The Data Revolution and Economic Analysis , 2013, Innovation Policy and the Economy.

[4]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[6]  James M. Keller,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions , 2010, IEEE Transactions on Fuzzy Systems.

[7]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Matthew Smith,et al.  Big data privacy issues in public social media , 2012, 2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST).

[9]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  James C. Bezdek,et al.  Extending fuzzy and probabilistic clustering to very large data sets , 2006, Comput. Stat. Data Anal..

[11]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[12]  Ramakant Nevatia,et al.  Cluster Boosted Tree Classifier for Multi-View, Multi-Pose Object Detection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Thomas Seidl,et al.  Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations , 2011, EDBT/ICDT '11.

[14]  Lawrence O. Hall,et al.  A fuzzy c means variant for clustering evolving data streams , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[15]  Andrew Stranieri,et al.  Empirical investigation of consensus clustering for large ECG data sets , 2012, 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS).

[16]  Hui Liu,et al.  A distributed clustering method to segment micro-blog users on cloud environments , 2013, 2013 25th Chinese Control and Decision Conference (CCDC).

[17]  Lawrence O. Hall,et al.  Single Pass Fuzzy C Means , 2007, 2007 IEEE International Fuzzy Systems Conference.

[18]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[19]  K. Fujimura,et al.  Clustering by SOM (self-organising maps), MST (minimal spanning tree) and MCP (modified counter-propagation) , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[20]  M. H. Fazel Zarandi,et al.  An Exponential Cluster Validity Index for Fuzzy Clustering with Crisp and Fuzzy Data , 2010 .

[21]  Cheng-Fa Tsai,et al.  A new data clustering approach for data mining in large databases , 2002, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02.

[22]  Qing Zhu,et al.  Privacy Protecting by Multiattribute Clustering in Data-Intensive Service , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[23]  Xue-Min Mao,et al.  Comparative research on methods of dimensionality reduction in high-dimension medical data , 2011, The Fourth International Workshop on Advanced Computational Intelligence.

[24]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[25]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.