论文信息 - A cloud-based optimal fuzzy clustering of distributed data

A cloud-based optimal fuzzy clustering of distributed data

Cloud computing is an infrastructure that allows the storage of large datasets. It provides a great and parallel computing which permits a faster computation on distributed data. The contribution of this paper concerns the development of a cloud-based fuzzy clustering algorithm of distributed datasets while detecting the optimal partition in a global view. The proposed algorithm meets the confidentiality constraint which prohibits the sharing of data between different resources while guaranteeing the data anonymity located on the cloud servers. A series of experiments was conducted to evaluate the efficiency of the proposed algorithm. The obtained results show the performance of the proposed algorithm on both quality and response time components.

Minyar Sassi Hidri | Rahma Souli-Jbali | Rahma Souli-Jbali

[1] Anil K. Jain. Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3] William Gropp,et al. Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[4] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5] Dongkyoo Shin,et al. Integration of Distributed Biological Data Using Modified K-Means Algorithm , 2007, PAKDD Workshops.

[6] Alfredo Petrosino,et al. P-AFLC: a parallel scalable fuzzy clustering algorithm , 2004, ICPR 2004.

[7] Minyar Sassi Hidri,et al. Clustering Quality Evaluation Based on Fuzzy FCA , 2007, DEXA.

[8] Yang Weng,et al. Diffusion-Based EM Algorithm for Distributed Estimation of Gaussian Mixtures in Wireless Sensor Networks , 2011, Sensors.

[9] Nistor Grozavu. Classification topologique pondérée : approches modulaires, hybrides et collaboratives , 2009 .

[10] J. Bezdek,et al. FCM: The fuzzy c-means clustering algorithm , 1984 .

[11] Devesh C. Jinwala,et al. An Efficient Approach for Privacy Preserving Distributed K-Means Clustering Based on Shamir's Secret Sharing Scheme , 2012, IFIPTM.

[12] Joydeep Ghosh,et al. Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[13] Fabrice Rossi,et al. Communication Challenges in Cloud K-means , 2011, ESANN.

[14] Ahmad Taher,et al. Adaptive Neuro-Fuzzy Systems , 2010 .

[15] J. A. Hartigan,et al. A k-means clustering algorithm , 1979 .

[16] Hans-Peter Kriegel,et al. A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[17] Mohamed S. Kamel,et al. Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[19] Shereen M. El-Metwally,et al. Decision tree classifiers for automated medical diagnosis , 2013, Neural Computing and Applications.

[20] B. F. Momin,et al. Modifications in K-Means Clustering Algorithm , 2012 .

[21] Sebastián Lozano,et al. Parallel Fuzzy c-Means Clustering for Large Data Sets , 2002, Euro-Par.

[22] Graham Cormode,et al. Conquering the Divide: Continuous Clustering of Distributed Data Streams , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23] Inderjit S. Dhillon,et al. A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[24] Aboul Ella Hassanien,et al. Fuzzy and hard clustering analysis for thyroid disease , 2013, Comput. Methods Programs Biomed..

[25] Ahmad Taher Azar,et al. Superior neuro-fuzzy classification systems , 2013, Neural Computing and Applications.