A cloud-based optimal fuzzy clustering of distributed data

Cloud computing is an infrastructure that allows the storage of large datasets. It provides a great and parallel computing which permits a faster computation on distributed data. The contribution of this paper concerns the development of a cloud-based fuzzy clustering algorithm of distributed datasets while detecting the optimal partition in a global view. The proposed algorithm meets the confidentiality constraint which prohibits the sharing of data between different resources while guaranteeing the data anonymity located on the cloud servers. A series of experiments was conducted to evaluate the efficiency of the proposed algorithm. The obtained results show the performance of the proposed algorithm on both quality and response time components.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Dongkyoo Shin,et al.  Integration of Distributed Biological Data Using Modified K-Means Algorithm , 2007, PAKDD Workshops.

[6]  Alfredo Petrosino,et al.  P-AFLC: a parallel scalable fuzzy clustering algorithm , 2004, ICPR 2004.

[7]  Minyar Sassi Hidri,et al.  Clustering Quality Evaluation Based on Fuzzy FCA , 2007, DEXA.

[8]  Yang Weng,et al.  Diffusion-Based EM Algorithm for Distributed Estimation of Gaussian Mixtures in Wireless Sensor Networks , 2011, Sensors.

[9]  Nistor Grozavu Classification topologique pondérée : approches modulaires, hybrides et collaboratives , 2009 .

[10]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[11]  Devesh C. Jinwala,et al.  An Efficient Approach for Privacy Preserving Distributed K-Means Clustering Based on Shamir's Secret Sharing Scheme , 2012, IFIPTM.

[12]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[13]  Fabrice Rossi,et al.  Communication Challenges in Cloud K-means , 2011, ESANN.

[14]  Ahmad Taher,et al.  Adaptive Neuro-Fuzzy Systems , 2010 .

[15]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[16]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[17]  Mohamed S. Kamel,et al.  Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[19]  Shereen M. El-Metwally,et al.  Decision tree classifiers for automated medical diagnosis , 2013, Neural Computing and Applications.

[20]  B. F. Momin,et al.  Modifications in K-Means Clustering Algorithm , 2012 .

[21]  Sebastián Lozano,et al.  Parallel Fuzzy c-Means Clustering for Large Data Sets , 2002, Euro-Par.

[22]  Graham Cormode,et al.  Conquering the Divide: Continuous Clustering of Distributed Data Streams , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[24]  Aboul Ella Hassanien,et al.  Fuzzy and hard clustering analysis for thyroid disease , 2013, Comput. Methods Programs Biomed..

[25]  Ahmad Taher Azar,et al.  Superior neuro-fuzzy classification systems , 2013, Neural Computing and Applications.