Ant Colony Optimization of Interval Type-2 Fuzzy C-Means with Subtractive Clustering and Multi-Round Sampling for Large Data

Fuzzy C-Means (FCM) is widely accepted as a clustering technique. However, it cannot often manage different uncertainties associated with data. Interval Type-2 Fuzzy C-Means (IT2FCM) is an improvement over FCM since it can model and minimize the effect of uncertainty efficiently. However, IT2FCM for large data often gets trapped in local optima and fails to find optimal cluster centers. To overcome this challenge an Ant Colony-based Optimization (ACO) is proposed. Another challenge encountered is determining the number of clusters to perform clustering. Subtractive clustering (SC) is an efficient technique to estimate appropriate number of clusters. Though for large datasets the convergence rate of ACO and SC becomes high and thus, it becomes challenging to cluster data and evaluate correct number of clusters. To encounter the challenges of large dataset, Multi-Round Sampling (MRS) technique is proposed. IT2FCM-ACO with SC and MRS technique performs clustering on subsets of data and determines suitable cluster centers and cluster number. The obtained clusters are then extended to the entire dataset. This eliminates the need for IT2FCM to work on the complete dataset. Thus, the objective of this paper is to optimize IT2FCM using ACO algorithm and to estimate the optimal number of clusters using SC while employing MRS to handle the challenges of voluminous data. Results obtained from several clustering evaluation measures shows the improved performance of IT2FCM-ACO-MRS compared to ITFCM-ACO and IT2FCM. Speed up for different sample size of dataset is computed and is found that IT2FCM-ACO-MRS is ≈1–5 times faster than IT2FCM and IT2FCM-ACO for medium datasets whereas for large datasets it is reported to be ≈ 30–150 times faster.

[1]  Izzatdin Abdul Aziz,et al.  A survey on textual semantic classification algorithms , 2017, 2017 IEEE Conference on Big Data and Analytics (ICBDA).

[2]  Li Wang,et al.  The Global Interval Type-2 Fuzzy C-Means clustering algorithm , 2011, 2011 International Conference on Multimedia Technology.

[3]  Lawrence O. Hall,et al.  Single Pass Fuzzy C Means , 2007, 2007 IEEE International Fuzzy Systems Conference.

[4]  Lawrence O. Hall,et al.  Accelerating Fuzzy-C Means Using an Estimated Subsample Size , 2014, IEEE Transactions on Fuzzy Systems.

[5]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[6]  Hung T. Nguyen,et al.  Data Clustering Using Variants of Rapid Centroid Estimation , 2014, IEEE Transactions on Evolutionary Computation.

[7]  Jian Xiao,et al.  A modified interval type-2 fuzzy C-means algorithm with application in MR image segmentation , 2013, Pattern Recognit. Lett..

[8]  Junzo Watada,et al.  A genetic type-2 fuzzy C-means clustering approach to M-FISH segmentation , 2014, J. Intell. Fuzzy Syst..

[9]  Long Thanh Ngo,et al.  Multiple kernel interval type-2 fuzzy c-means clustering , 2013, 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[10]  Don-Lin Yang,et al.  An efficient Fuzzy C-Means clustering algorithm , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Jian Zhang,et al.  An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO , 2014, Comput. Intell. Neurosci..

[12]  James C. Bezdek,et al.  Extending fuzzy and probabilistic clustering to very large data sets , 2006, Comput. Stat. Data Anal..

[13]  Oscar Castillo,et al.  Optimization of the Interval Type-2 Fuzzy C-Means using Particle Swarm Optimization , 2013, 2013 World Congress on Nature and Biologically Inspired Computing.

[14]  B. Chandra Mohan,et al.  A survey: Ant Colony Optimization based recent research and implementation on several engineering domain , 2012, Expert Syst. Appl..

[15]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[16]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[17]  Oscar Castillo,et al.  Interval type-2 fuzzy clustering for membership function generation , 2013, 2013 IEEE Workshop on Hybrid Intelligent Models and Applications (HIMA).

[18]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[19]  Byung-In Choi,et al.  Interval type-2 fuzzy membership function generation methods for pattern recognition , 2009, Inf. Sci..

[20]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[21]  Thomas A. Runkler Ant colony optimization of clustering models , 2005, Int. J. Intell. Syst..

[22]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[23]  Lawrence O. Hall,et al.  A Scalable Framework For Segmenting Magnetic Resonance Images , 2009, J. Signal Process. Syst..

[24]  Long Thanh Ngo,et al.  GMKIT2-FCM: A Genetic-based improved Multiple Kernel Interval Type-2 FUzzy C-means clustering , 2013, 2013 IEEE International Conference on Cybernetics (CYBCO).

[25]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[26]  Frank Chung-Hoon Rhee,et al.  Uncertain Fuzzy Clustering: Interval Type-2 Fuzzy Approach to $C$-Means , 2007, IEEE Transactions on Fuzzy Systems.

[27]  Feng Zhao,et al.  Pareto-based interval type-2 fuzzy c-means with multi-scale JND color histogram for image segmentation , 2018, Digit. Signal Process..

[28]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[29]  Peter J. Huber,et al.  Data Analysis: What Can Be Learned From the Past 50 Years , 2011 .

[30]  John W. Fowler,et al.  A clustering algorithm for supplier base management , 2010 .

[31]  Sana Qaiyum,et al.  Analysis of Big Data and Quality-of-Experience in High-Density Wireless Network , 2016, 2016 3rd International Conference on Computer and Information Sciences (ICCOINS).

[32]  Liu Pengfei,et al.  Tailoring Fuzzy C-Means Clustering Algorithm for Big Data Using Random Sampling and Particle Swarm Optimization , 2015 .

[33]  Long Thanh Ngo,et al.  Genetic Based Interval Type-2 Fuzzy C-Means Clustering , 2012, ICCASA.

[34]  Moacir Godinho Filho,et al.  Literature review regarding Ant Colony Optimization applied to scheduling problems: Guidelines for implementation and directions for future research , 2013, Eng. Appl. Artif. Intell..

[35]  Lawrence O. Hall,et al.  Fast Accurate Fuzzy Clustering through Data Reduction , 2003 .

[36]  Byung-In Choi,et al.  Interval Type-2 Fuzzy Membership Function Design and its Application to Radial Basis Function Neural Networks , 2007, 2007 IEEE International Fuzzy Systems Conference.

[37]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[39]  Rong Jin,et al.  Approximate kernel k-means: solution to large scale kernel clustering , 2011, KDD.

[40]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[41]  Rong Jin,et al.  Speedup of fuzzy and possibilistic kernel c-means for large-scale clustering , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[42]  N. Haron,et al.  Akademia Baru Quality-of-Experience Modeling in High-Density Wireless Network , 2015 .

[43]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[44]  Oscar Castillo,et al.  An Extension of the Fuzzy Possibilistic Clustering Algorithm Using Type-2 Fuzzy Logic Techniques , 2017, Adv. Fuzzy Syst..