A Comparative Study of Density-based Clustering Algorithms on Data Streams: Micro-clustering Approaches

Clustering data streams is a challenging problem in mining data streams. Data streams need to be read by a clustering algorithm in a single pass with limited time, and memory whereas they may change over time. Different clustering algorithms have been developed for data streams. Density-based algorithms are a remarkable group in clustering data that can find arbitrary shape clusters, and handle the outliers as well. In recent years, density-based clustering algorithms are adopted for data streams. However, in clustering data streams, it is impossible to record all data streams. Micro-clustering is a summarization method used to record synopsis information about data streams. Various algorithms apply micro-clustering methods for clustering data streams. In this paper, we will concentrate on the density-based clustering algorithms that use micro-clustering methods for clustering and we refer them as density-micro clustering algorithms. We review the algorithms in details and compare them based on different characteristics.

[1]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[2]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[3]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[4]  Jiadong Ren,et al.  Density-Based Data Streams Clustering over Sliding Windows , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[5]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[6]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[7]  Mohamed Medhat Gaber,et al.  Data Stream Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[8]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[9]  Ying Wah Teh,et al.  A study of density-grid based clustering algorithms on data streams , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[10]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[11]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[12]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[13]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Guojun Gan,et al.  Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability) , 2007 .

[20]  Myra Spiliopoulou,et al.  C-DenStream: Using Domain Knowledge on a Data Stream , 2009, Discovery Science.

[21]  Ira Assent,et al.  The ClusTree: indexing micro-clusters for anytime stream mining , 2011, Knowledge and Information Systems.

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[24]  Hai Huang,et al.  A three-step clustering algorithm over an evolving data stream , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.