Incremental Interval Type-2 Fuzzy Clustering of Data Streams using Single Pass Method

Data Streams create new challenges for fuzzy clustering algorithms, specifically Interval Type-2 Fuzzy C-Means (IT2FCM). One problem associated with IT2FCM is that it tends to be sensitive to initialization conditions and therefore, fails to return global optima. This problem has been addressed by optimizing IT2FCM using Ant Colony Optimization approach. However, IT2FCM-ACO obtain clusters for the whole dataset which is not suitable for clustering large streaming datasets that may be coming continuously and evolves with time. Thus, the clusters generated will also evolve with time. Additionally, the incoming data may not be available in memory all at once because of its size. Therefore, to encounter the challenges of a large data stream environment we propose improvising IT2FCM-ACO to generate clusters incrementally. The proposed algorithm produces clusters by determining appropriate cluster centers on a certain percentage of available datasets and then the obtained cluster centroids are combined with new incoming data points to generate another set of cluster centers. The process continues until all the data are scanned. The previous data points are released from memory which reduces time and space complexity. Thus, the proposed incremental method produces data partitions comparable to IT2FCM-ACO. The performance of the proposed method is evaluated on large real-life datasets. The results obtained from several fuzzy cluster validity index measures show the enhanced performance of the proposed method over other clustering algorithms. The proposed algorithm also improves upon the run time and produces excellent speed-ups for all datasets.

[1]  Lina Hao,et al.  Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data , 2018, Cluster Computing.

[2]  Zhongdong Wu,et al.  Fuzzy C-means clustering algorithm based on kernel method , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[3]  Shirin Enshaeifar,et al.  IoT-Stream: A Lightweight Ontology for Internet of Things Data Streams and Its Use with Data Analytics and Event Detection Services † , 2020, Sensors.

[4]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[5]  Lawrence O. Hall,et al.  Fast accurate fuzzy clustering through data reduction , 2003, IEEE Trans. Fuzzy Syst..

[6]  L.O. Hall,et al.  Online fuzzy c means , 2008, NAFIPS 2008 - 2008 Annual Meeting of the North American Fuzzy Information Processing Society.

[7]  Klemen Kenda,et al.  Streaming Data Fusion for the Internet of Things , 2019, Sensors.

[8]  PolikarRobi,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011 .

[9]  Li Wang,et al.  The Global Interval Type-2 Fuzzy C-Means clustering algorithm , 2011, 2011 International Conference on Multimedia Technology.

[10]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[11]  Lawrence O. Hall,et al.  Single Pass Fuzzy C Means , 2007, 2007 IEEE International Fuzzy Systems Conference.

[12]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[13]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[14]  Junzo Watada,et al.  A genetic type-2 fuzzy C-means clustering approach to M-FISH segmentation , 2014, J. Intell. Fuzzy Syst..

[15]  Swati Aggarwal,et al.  Ant Colony Based Fuzzy C-Means Clustering for Very Large Data , 2017, EUSFLAT/IWIFSGN.

[16]  Jian Xiao,et al.  Enhanced interval type-2 fuzzy c-means algorithm with improved initial center , 2014, Pattern Recognit. Lett..

[17]  Rong Jin,et al.  Speedup of fuzzy and possibilistic kernel c-means for large-scale clustering , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[18]  Peter J. Huber Data Analysis: What Can Be Learned From the Past 50 Years , 2011 .

[19]  Barbara Hammer,et al.  Incremental learning algorithms and applications , 2016, ESANN.

[20]  Noureddine Zahid,et al.  A new cluster-validity for fuzzy clustering , 1999, Pattern Recognit..

[21]  Sana Qaiyum,et al.  Ant Colony Optimization of Interval Type-2 Fuzzy C-Means with Subtractive Clustering and Multi-Round Sampling for Large Data , 2019, International Journal of Advanced Computer Science and Applications.

[22]  J. Mendel Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions , 2001 .

[23]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[24]  Weixu,et al.  Effectiveness of the Euclidean distance in high dimensional spaces , 2015 .

[25]  Liu Pengfei,et al.  Tailoring Fuzzy C-Means Clustering Algorithm for Big Data Using Random Sampling and Particle Swarm Optimization , 2015 .

[26]  Mourad Khayati,et al.  2015 Ieee International Conference on Big Data (big Data) Online Anomaly Detection over Big Data Streams , 2022 .

[27]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[28]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[29]  C. Papadimitriou,et al.  Introduction to the Theory of Computation , 2018 .

[30]  S. Siva Sathya,et al.  A Survey of Bio inspired Optimization Algorithms , 2012 .

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[33]  James C. Bezdek,et al.  Comparison of scalable fuzzy clustering methods , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[34]  Lawrence O. Hall,et al.  Accelerating Fuzzy-C Means Using an Estimated Subsample Size , 2014, IEEE Transactions on Fuzzy Systems.

[35]  Sung-Bae Cho,et al.  A hybrid genetic based functional link artificial neural network with a statistical comparison of classifiers over multiple datasets , 2010, Neural Computing and Applications.

[36]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[37]  Jie Lu,et al.  A fuzzy kernel c-means clustering model for handling concept drift in regression , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[38]  Marimuthu Palaniswami,et al.  Incremental Kernel Fuzzy c-Means , 2010, IJCCI.

[39]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[40]  Frank Chung-Hoon Rhee,et al.  Uncertain Fuzzy Clustering: Interval Type-2 Fuzzy Approach to $C$-Means , 2007, IEEE Transactions on Fuzzy Systems.

[41]  Khaled Ghédira,et al.  Discussion and review on evolving data streams and concept drift adapting , 2018, Evol. Syst..

[42]  Eitan M. Gurari,et al.  Introduction to the theory of computation , 1989 .

[43]  Miin-Shen Yang,et al.  A cluster validity index for fuzzy clustering , 2005, Pattern Recognit. Lett..

[44]  Li Chen,et al.  Fast kernel fuzzy c-means algorithms based on difference of convex programming , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[45]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[46]  Feng Zhao,et al.  Pareto-based interval type-2 fuzzy c-means with multi-scale JND color histogram for image segmentation , 2018, Digit. Signal Process..

[47]  Pasi Fränti,et al.  Set Matching Measures for External Cluster Validity , 2016, IEEE Transactions on Knowledge and Data Engineering.

[48]  Timothy C. Havens,et al.  Scalable approximation of kernel fuzzy c-means , 2013, 2013 IEEE International Conference on Big Data.

[49]  Albert Bifet,et al.  Massive Online Analysis , 2009 .