MuDi-Stream: A multi density clustering algorithm for evolving data stream

Density-based method has emerged as a worthwhile class for clustering data streams. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problem. There is a dramatic decrease in the quality of clustering when there is a range in density of data. In this paper, a new method, called the MuDi-Stream, is developed. It is an online-offline algorithm with four main components. In the online phase, it keeps summary information about evolving multi-density data stream in the form of core mini-clusters. The offline phase generates the final clusters using an adapted density-based clustering algorithm. The grid-based method is used as an outlier buffer to handle both noises and multi-density data and yet is used to reduce the merging time of clustering. The algorithm is evaluated on various synthetic and real-world datasets using different quality metrics and further, scalability results are compared. The experimental results show that the proposed method in this study improves clustering quality in multi-density environments.

[1]  Jason J. Jung Semantic preprocessing for mining sensor streams from heterogeneous environments , 2011, Expert Syst. Appl..

[2]  Ying Wah Teh,et al.  On Density-Based Data Streams Clustering Algorithms: A Survey , 2014, Journal of Computer Science and Technology.

[3]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[4]  Alfredo Ferro,et al.  Enhancing density-based clustering: Parameter reduction and outlier detection , 2013, Inf. Syst..

[5]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[6]  Ying Wah Teh,et al.  A study of density-grid based clustering algorithms on data streams , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[7]  Martin Ester,et al.  Density‐based clustering , 2019, WIREs Data Mining Knowl. Discov..

[8]  Ling Chen,et al.  A clustering algorithm for multiple data streams based on spectral component similarity , 2012, Inf. Sci..

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Sushmita Mitra,et al.  KDDClus : A Simple Method for Multi-Density Clustering , 2011 .

[11]  Hans-Peter Kriegel,et al.  Density-based Projected Clustering over High Dimensional Data Streams , 2012, SDM.

[12]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[13]  Li Tu,et al.  Stream data clustering based on grid density and attraction , 2009, TKDD.

[14]  Thomas Seidl,et al.  An effective evaluation measure for clustering on evolving data streams , 2011, KDD.

[15]  Martin Ester,et al.  Density-based Clustering , 2018, Encyclopedia of Database Systems.

[16]  Xunfei Jiang,et al.  YJNCA 1747 To appear in : Journal of Network and Computer Applications , 2018 .

[17]  Edward R. Dougherty,et al.  Model-based evaluation of clustering validation measures , 2007, Pattern Recognit..

[18]  Teh Ying Wah,et al.  Density Micro-Clustering Algorithms on Data Streams: A Review , 2011 .

[19]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[20]  GiugnoRosalba,et al.  Enhancing density-based clustering , 2013 .

[21]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[22]  Xiaoyun Chen,et al.  An Improved Semi-Supervised Clustering Algorithm for Multi-Density Datasets with Fewer Constraints , 2012 .

[23]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[24]  Mohsen Sayyadi,et al.  GDCLU: A New Grid-Density Based ClustrIng Algorithm , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[25]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[26]  Amin Namadchian,et al.  DSCLU: A New Data Stream Clustring Algorithm for Multi Density Environments , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[27]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  Chen Xiaoyun,et al.  GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid , 2008, ICEBE.

[30]  Yunming Ye,et al.  On cluster tree for nested and multi-density data clustering , 2010, Pattern Recognit..

[31]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[32]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[33]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[34]  Mohamed Medhat Gaber,et al.  Density-Based Projected Clustering of Data Streams , 2012, SUM.

[35]  Thanapat Kangkachit,et al.  HUE-Stream: Evolution-Based Clustering Technique for Heterogeneous Data Streams with Uncertainty , 2011, ADMA.

[36]  Kai Li,et al.  Semi-Supervised Clustering Algorithm for Multi-Density and Complex Shape Dataset , 2008, 2008 Chinese Conference on Pattern Recognition.

[37]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[38]  Hassan Abolhassani,et al.  MSDBSCAN: Multi-density Scale-Independent Clustering Algorithm Based on DBSCAN , 2010, ADMA.

[39]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[40]  Teh Ying Wah,et al.  A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream , 2014, TheScientificWorldJournal.

[41]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[42]  M. Cugmas,et al.  On comparing partitions , 2015 .

[43]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[44]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[45]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[46]  Giandomenico Spezzano,et al.  A single pass algorithm for clustering evolving data streams based on swarm intelligence , 2011, Data Mining and Knowledge Discovery.

[47]  Kai Li,et al.  Reckon the Parameter of DBSCAN for Multi-density Data Sets with Constraints , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[48]  Sharma Chakravarthy,et al.  Clustering data streams using grid-based synopsis , 2013, Knowledge and Information Systems.

[49]  Ying Wah Teh,et al.  A Multi Density-Based Clustering Algorithm for Data Stream with Noise , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.