A buffer-based online clustering for evolving data stream

Abstract Data stream clustering plays an important role in data stream mining for knowledge extraction. Numerous researchers have recently studied density-based clustering algorithms due to their capability to generate arbitrarily shaped clusters. However, most of the algorithms are either fully offline, hybrid online/offline, or cannot handle the property of evolving data stream. Recently, a fully online clustering algorithm for evolving data stream called CEDAS was proposed. However, similar to other density-based clustering algorithms, CEDAS requires predefining the global optimal radius of micro-clusters, which is a difficult task; in addition, an erroneous choice deteriorates cluster performance. Moreover, the algorithm ignores the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. In this study, we present a fully online density-based clustering algorithm called buffer-based online clustering for evolving data stream (BOCEDS). This algorithm recursively updates the micro-cluster radius to its local optimal. It also introduces a buffer for storing irrelevant micro-clusters and a fully online pruning method for extracting the temporarily irrelevant micro-cluster from the buffer. In addition, BOCEDS proposes an online micro-cluster energy-updating function based on the spatial information of the data stream. Experimental results are compared with those of CEDAS and other alternative hybrid online/offline density-based clustering algorithms, and BOCEDS proves its superiority over the other clustering algorithms. The sensitivity of clustering parameters is also measured. The proposed algorithm is then applied to real-world weather data streams to demonstrate its capability to detect changes in data stream and discover arbitrarily shaped clusters. The proposed BOCEDS can be available in https://sites.google.com/view/md-manjur-ahmed and https://sites.google.com/view/kamrul-just.

[1]  Stephen D. Bay,et al.  The UCI KDD archive of large data sets for data mining research and experimentation , 2000, SKDD.

[2]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[3]  Rahim Tafazolli,et al.  Adaptive Clustering for Dynamic IoT Data Streams , 2017, IEEE Internet of Things Journal.

[4]  Fu-Cai Chen,et al.  Online stream clustering using density and affinity propagation algorithm , 2013, 2013 IEEE 4th International Conference on Software Engineering and Service Science.

[5]  Ying Wah Teh,et al.  On Density-Based Data Streams Clustering Algorithms: A Survey , 2014, Journal of Computer Science and Technology.

[6]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[7]  Nikos Pelekis,et al.  An evaluation of data stream clustering algorithms , 2018, Stat. Anal. Data Min..

[8]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[9]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[10]  Edwin Lughofer,et al.  Autonomous data stream clustering implementing split-and-merge concepts - Towards a plug-and-play approach , 2015, Inf. Sci..

[11]  Changqing Yan,et al.  An arbitrary shape clustering algorithm over variable density data streams , 2017 .

[12]  Nikola Kasabov,et al.  ECM — A Novel On-line, Evolving Clustering Method and Its Applications , 2001 .

[13]  Mahsa Salehi,et al.  Online Clustering for Evolving Data Streams with Online Anomaly Detection , 2018, PAKDD.

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[15]  W. Wang,et al.  An Evolving Fuzzy Predictor for Industrial Applications , 2008, IEEE Transactions on Fuzzy Systems.

[16]  Zhiqiang Wang,et al.  Clustering by Local Gravitation , 2018, IEEE Transactions on Cybernetics.

[17]  Hongjie Jia,et al.  Research on data stream clustering algorithms , 2013, Artificial Intelligence Review.

[18]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[19]  Jinxian Lin,et al.  A density-based clustering over evolving heterogeneous data stream , 2009, 2009 ISECS International Colloquium on Computing, Communication, Control, and Management.

[20]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[21]  Rodrigo Fernandes de Mello,et al.  Estimating data stream tendencies to adapt clustering parameters , 2018, Int. J. High Perform. Comput. Netw..

[22]  Witold Pedrycz,et al.  Evolvable fuzzy systems: some insights and challenges , 2010, Evol. Syst..

[23]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[24]  Murat Ekinci,et al.  A graph form data stream clustering approach based on dimension reduction , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[25]  Michael Hahsler,et al.  SOStream: Self Organizing Density-Based Clustering over Data Stream , 2012, MLDM.

[26]  Babak Nadjar Araabi,et al.  Evolving Takagi-Sugeno fuzzy model based on switching to neighboring models , 2013, Appl. Soft Comput..

[27]  Matthias Carnein,et al.  An Empirical Comparison of Stream Clustering Algorithms , 2017, Conf. Computing Frontiers.

[28]  Giandomenico Spezzano,et al.  A single pass algorithm for clustering evolving data streams based on swarm intelligence , 2011, Data Mining and Knowledge Discovery.

[29]  Rong Zheng,et al.  RECOME: a New Density-Based Clustering Algorithm Using Relative KNN Kernel Density , 2016, Inf. Sci..

[30]  Russel Pears,et al.  A Novel Evolving Clustering Algorithm with Polynomial Regression for Chaotic Time-Series Prediction , 2009, ICONIP.

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[33]  Plamen P. Angelov,et al.  Evolving local means method for clustering of streaming data , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[34]  B. Alatas,et al.  Big Social Network Data and Sustainable Economic Development , 2017 .

[35]  Myra Spiliopoulou,et al.  C-DenStream: Using Domain Knowledge on a Data Stream , 2009, Discovery Science.

[36]  Plamen Angelov,et al.  Fully online clustering of evolving data streams into arbitrarily shaped clusters , 2017, Inf. Sci..

[37]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments: Methods and Applications , 2012 .

[38]  Hans-Peter Kriegel,et al.  Density-based Projected Clustering over High Dimensional Data Streams , 2012, SDM.

[39]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[40]  Igor Skrjanc,et al.  Incremental Rule Splitting in Generalized Evolving Fuzzy Systems for Autonomous Drift Compensation , 2018, IEEE Transactions on Fuzzy Systems.

[41]  Plamen P. Angelov,et al.  DEC: Dynamically Evolving Clustering and Its Application to Structure Identification of Evolving Fuzzy Models , 2014, IEEE Transactions on Cybernetics.

[42]  Jeffrey A. Cardille,et al.  Uncovering Dominant Land-Cover Patterns of Quebec: Representative Landscapes, Spatial Clusters, and Fences , 2013 .

[43]  Jiadong Ren,et al.  Density-Based Data Streams Clustering over Sliding Windows , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[44]  Yuhan Liu,et al.  Clustering based on grid and local density with priority-based expansion for multi-density data , 2018, Inf. Sci..

[45]  Plamen Angelov,et al.  A new online clustering approach for data in arbitrary shaped clusters , 2015, 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF).

[46]  Noureddine Zerhouni,et al.  Evidential evolving Gustafson-Kessel algorithm for online data streams partitioning using belief function theory , 2012, Int. J. Approx. Reason..

[47]  Shengxiang Yang,et al.  Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams , 2019, IEEE Transactions on Cybernetics.

[48]  Hongjie Jia,et al.  An Adaptive Density Data Stream Clustering Algorithm , 2015, Cognitive Computation.

[49]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[50]  Hai Huang,et al.  A three-step clustering algorithm over an evolving data stream , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[51]  Leon Glass,et al.  Mackey-Glass equation , 2010, Scholarpedia.

[52]  Mohamed Medhat Gaber,et al.  Density-Based Projected Clustering of Data Streams , 2012, SUM.

[53]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[54]  Yue Tan,et al.  Synchronization-based clustering on evolving data stream , 2019, Inf. Sci..

[55]  Sohail Asghar,et al.  Critical analysis of DBSCAN variations , 2010, 2010 International Conference on Information and Emerging Technologies.

[56]  Julien Jacques,et al.  Functional data clustering: a survey , 2013, Advances in Data Analysis and Classification.