Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering

Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithms which have been widely used in pattern recognition, data compression, image segmentation, computer vision and many other fields—also faces the problem of processing large datasets. In this paper, we propose some novel hybrid clustering algorithms based on incremental clustering and initial selection to tune up FCM for the Big Data problem. The first algorithm determines meshes of rectangle covering data points as the representatives, while the second one considers data points that have high influence to others as the representatives. The representatives are then clustered by FCM, and the new centers are selected as initial ones for clustering of the dataset. Theoretical analyses of the new algorithms including comparison of quality of solutions when clustering the representatives set versus the entire set are examined. The experimental results on both simulated and real datasets show that total computational time of the new methods including time of finding representatives and clustering is faster than those of other relevant algorithms. The validation on clustering quality is also examined. The findings of this paper have great impact and significance to researches in the fields of soft computing and Big Data processing. It is obvious that computing methodologies nowadays are facing with huge amount of diverse and complex data structures. Speed of processing is the main priority when considering effectiveness of a specific method. The findings demonstrated practical algorithms and investigated their characteristics that could be referenced by other researchers in similar applications. The usefulness and significance of this research are clearly demonstrated within the extent of real-life applications.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  Hamid Parvin,et al.  A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm , 2013, Pattern Analysis and Applications.

[3]  B. Eswara Reddy,et al.  Speeding-up the kernel k-means clustering method: A prototype based hybrid approach , 2013, Pattern Recognit. Lett..

[4]  Lawrence O. Hall,et al.  Fast fuzzy clustering , 1998, Fuzzy Sets Syst..

[5]  Le Hoang Son,et al.  HIFCF: An effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis , 2015, Expert Syst. Appl..

[6]  S. R. Kannan,et al.  Extended fuzzy c-means: an analyzing data clustering problems , 2012, Cluster Computing.

[7]  Pier Luca Lanzi,et al.  A novel intuitionistic fuzzy clustering method for geo-demographic analysis , 2012, Expert Syst. Appl..

[8]  S. Rahimi,et al.  A parallel Fuzzy C-Mean algorithm for image segmentation , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[9]  Abraham Kandel,et al.  Dynamic Incremental K-means Clustering , 2014, 2014 International Conference on Computational Science and Computational Intelligence.

[10]  Christian Borgelt,et al.  Speeding up fuzzy clustering with neural network techniques , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[11]  Inderveer Chana,et al.  A survey of clustering techniques for big data analysis , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[12]  Le Hoang Son Optimizing Municipal Solid Waste collection using Chaotic Particle Swarm Optimization in GIS based environments: A case study at Danang city, Vietnam , 2014, Expert Syst. Appl..

[13]  László Szilágyi,et al.  Fast color reduction using approximative c-means clustering models , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[14]  Witold Pedrycz,et al.  The potential of fuzzy neural networks in the realization of approximate reasoning engines , 2006, Fuzzy Sets Syst..

[15]  Le Hoang Son HU-FCF++: A novel hybrid method for the new user cold-start problem in recommender systems , 2015, Eng. Appl. Artif. Intell..

[16]  Le Hoang Son DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets , 2015, Expert Syst. Appl..

[17]  James M. Keller,et al.  Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units , 2008, IEEE Transactions on Fuzzy Systems.

[18]  Pier Luca Lanzi,et al.  Data Mining in GIS: A Novel Context-Based Fuzzy Geographically Weighted Clustering Algorithm , 2012 .

[19]  Dervis Karaboga,et al.  Improved clustering criterion for image clustering with artificial bee colony algorithm , 2014, Pattern Analysis and Applications.

[20]  Timothy C. Havens,et al.  Scalable approximation of kernel fuzzy c-means , 2013, 2013 IEEE International Conference on Big Data.

[21]  Mohammad Taherdangkoo,et al.  A powerful hybrid clustering method based on modified stem cells and Fuzzy C-means algorithms , 2013, Eng. Appl. Artif. Intell..

[22]  Lei Zhang,et al.  Joint segmentation and pairing of multispectral chromosome images , 2011, Pattern Analysis and Applications.

[23]  Yueting Zhuang,et al.  Fuzzy hierarchical clustering algorithm facing large databases , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[24]  Le Hoang Son Dealing with the new user cold-start problem in recommender systems: A comparative review , 2016, Inf. Syst..

[25]  Ying Wah Teh,et al.  Big Data Clustering: A Review , 2014, ICCSA.

[26]  Le Hoang Son,et al.  Intuitionistic fuzzy recommender systems: An effective tool for medical diagnosis , 2015, Knowl. Based Syst..

[27]  Jing Li,et al.  A Fixed Suppressed Rate Selection Method for Suppressed Fuzzy C-Means Clustering Algorithm , 2014 .

[28]  Esslli Site,et al.  Probabilistic Models in the Study of Language , 2012 .

[29]  Lawrence O. Hall,et al.  Accelerating Fuzzy-C Means Using an Estimated Subsample Size , 2014, IEEE Transactions on Fuzzy Systems.

[30]  Fang Yao,et al.  Improved Fuzzy C-Means Based on the Optimal Number of Clusters , 2013 .

[31]  Yating Hu,et al.  An unsupervised possibilistic c-means clustering algorithm with data reduction , 2013, 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[32]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[33]  Le Hoang Son,et al.  A New Approach to Multi-variable Fuzzy Forecasting Using Picture Fuzzy Clustering and Picture Fuzzy Rule Interpolation Method , 2014, KSE.

[34]  Kil To Chong,et al.  Erratum to: Fast global kernel fuzzy c-means clustering algorithm for consonant/vowel segmentation of speech signal , 2014, Journal of Zhejiang University SCIENCE C.

[35]  Lawrence O. Hall Exploring Big Data with Scalable Soft Clustering , 2012, SMPS.

[36]  Zhikui Chen,et al.  A weighted kernel possibilistic c‐means algorithm based on cloud computing for clustering big data , 2014, Int. J. Commun. Syst..

[37]  Zhaohong Deng,et al.  Double indices-induced FCM clustering and its integration with fuzzy subspace clustering , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[38]  Don-Lin Yang,et al.  An efficient Fuzzy C-Means clustering algorithm , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[39]  Generalization rules for the suppressed fuzzy c-means clustering algorithm , 2014, Neurocomputing.

[40]  Le Hoang Son HU-FCF: A hybrid user-based fuzzy collaborative filtering method in Recommender Systems , 2014, Expert Syst. Appl..

[41]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[42]  Jian-Ping Mei,et al.  Incremental Fuzzy Clustering With Multiple Medoids for Large Data , 2014, IEEE Transactions on Fuzzy Systems.

[43]  Le Hoang Son,et al.  A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network , 2014, Eng. Appl. Artif. Intell..

[44]  Yong Yang,et al.  A modified possibilistic fuzzy c-means clustering algorithm , 2013, 2013 Ninth International Conference on Natural Computation (ICNC).

[45]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Le Hoang Son Enhancing clustering quality of geo-demographic analysis using context fuzzy clustering type-2 and particle swarm optimization , 2014, Appl. Soft Comput..

[47]  G. Marsaglia RANDOM VARIABLES AND COMPUTERS , 1962 .

[48]  Le Hoang Son,et al.  Spatial interaction - modification model and applications to geo-demographic analysis , 2013, Knowl. Based Syst..

[49]  Le Hoang Son,et al.  Some context fuzzy clustering methods for classification problems , 2010, SoICT '10.