Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams

A data stream is a continuously arriving sequence of data and clustering data streams requires additional considerations to traditional clustering. A stream is potentially unbounded, data points arrive online and each data point can be examined only once. This imposes limitations on available memory and processing time. Furthermore, streams can be noisy and the number of clusters in the data and their statistical properties can change over time. This paper presents an online, bio-inspired approach to clustering dynamic data streams. The proposed ant colony stream clustering (ACSC) algorithm is a density-based clustering algorithm, whereby clusters are identified as high-density areas of the feature space separated by low-density areas. ACSC identifies clusters as groups of micro-clusters. The tumbling window model is used to read a stream and rough clusters are incrementally formed during a single pass of a window. A stochastic method is employed to find these rough clusters, this is shown to significantly speeding up the algorithm with only a minor cost to performance, as compared to a deterministic approach. The rough clusters are then refined using a method inspired by the observed sorting behavior of ants. Ants pick-up and drop items based on the similarity with the surrounding items. Artificial ants sort clusters by probabilistically picking and dropping micro-clusters based on local density and local similarity. Clusters are summarized using their constituent micro-clusters and these summary statistics are stored offline. Experimental results show that the clustering quality of ACSC is scalable, robust to noise and favorable to leading ant clustering and stream-clustering algorithms. It also requires fewer parameters and less computational time.

[1]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1998 .

[2]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[3]  Agostinho C. Rosa,et al.  KANTS: A Stigmergic Ant Algorithm for Cluster Analysis and Swarm Art , 2014, IEEE Transactions on Cybernetics.

[4]  Thomas A. Runkler Ant colony optimization of clustering models , 2005, Int. J. Intell. Syst..

[5]  Bing Liu,et al.  A Fast Density-Based Clustering Algorithm for Large Databases , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[6]  Gilles Venturini,et al.  AntTree: a new model for clustering with artificial ants , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[7]  Ira Assent,et al.  The ClusTree: indexing micro-clusters for anytime stream mining , 2011, Knowledge and Information Systems.

[8]  L.N. de Castro,et al.  Text document classification using swarm intelligence , 2005, International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2005..

[9]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[10]  Baldo Faieta,et al.  Diversity and adaptation in populations of clustering ants , 1994 .

[11]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[12]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[13]  Plamen P. Angelov,et al.  DEC: Dynamically Evolving Clustering and Its Application to Structure Identification of Evolving Fuzzy Models , 2014, IEEE Transactions on Cybernetics.

[14]  Khaled Mahar,et al.  Using grid for accelerating density-based clustering , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[15]  Marco Dorigo,et al.  Ant-Based Clustering and Topographic Mapping , 2006, Artificial Life.

[16]  Li Tu,et al.  Stream data clustering based on grid density and attraction , 2009, TKDD.

[17]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[18]  B. Kulkarni,et al.  An ant colony approach for clustering , 2004 .

[19]  Nicolas Monmarché,et al.  AntClust: Ant Clustering and Web Usage Mining , 2003, GECCO.

[20]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21]  Jing Li,et al.  A new hybrid method based on partitioning-based DBSCAN and ant clustering , 2011, Expert Syst. Appl..

[22]  Giandomenico Spezzano,et al.  A single pass algorithm for clustering evolving data streams based on swarm intelligence , 2011, Data Mining and Knowledge Discovery.

[23]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[24]  Urszula Boryczka Finding Groups in Data: Cluster Analysis with Ants , 2006, ISDA.

[25]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[26]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[27]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[28]  Mustapha Lebbah,et al.  A new Growing Neural Gas for clustering data streams , 2016, Neural Networks.

[29]  Mehmet Korürek,et al.  A new arrhythmia clustering technique based on Ant Colony Optimization , 2008, J. Biomed. Informatics.

[30]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[31]  L. Hall,et al.  Creating Streaming Iterative Soft Clustering Algorithms , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[32]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[33]  Lawrence O. Hall,et al.  Fuzzy Ants and Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[34]  F. Wilcoxon SOME RAPID APPROXIMATE STATISTICAL PROCEDURES , 1950 .

[35]  Simon Fong,et al.  DBSCAN: Past, present and future , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[36]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[37]  Julia Handl,et al.  Ant-based and swarm-based clustering , 2007, Swarm Intelligence.

[38]  João Gama,et al.  Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency , 2015, SDM.

[39]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[40]  Abdel-Badeeh M. Salem,et al.  Density Clustering Based On Radius of Data (DCBRD) , 2008 .

[41]  Maher Ben Jemaa,et al.  How to use ants for data stream clustering , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[42]  Nikola K. Kasabov,et al.  ESOM: an algorithm to evolve self-organizing maps from online data streams , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[43]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.