Online Clustering of Evolving Data Streams Using a Density Grid-Based Method

In recent years, a significant boost in data availability for persistent data streams has been observed. These data streams are continually evolving, with the clusters frequently forming arbitrary shapes instead of regular shapes in the data space. This characteristic leads to an exponential increase in the processing time of traditional clustering algorithms for data streams. In this study, we propose a new online method, which is a density grid-based method for data stream clustering. The primary objectives of the density grid-based method are to reduce the number of distant function calls and to improve the cluster quality. The method is conducted entirely online and consists of two main phases. The first phase generates the Core Micro-Clusters (CMCs), and the second phase combines the CMCs into macro clusters. The grid-based method was utilized as an outlier buffer in order to handle multi-density data and noises. The method was tested on real and synthetic data streams employing different quality metrics and was compared with the popular method of clustering evolving data streams into arbitrary shapes. The proposed method was demonstrated to be an effective solution for reducing the number of calls to the distance function and improving the cluster quality.

[1]  Ibrar Yaqoob,et al.  Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges , 2017, IEEE Access.

[2]  Athanasios V. Vasilakos,et al.  When things matter: A survey on data-centric internet of things , 2016, J. Netw. Comput. Appl..

[3]  Ying Wah Teh,et al.  MuDi-Stream: A multi density clustering algorithm for evolving data stream , 2016, J. Netw. Comput. Appl..

[4]  Michal Cerný,et al.  Narrow big data in a stream: Computational limitations and regression , 2019, Inf. Sci..

[5]  Yong Shi,et al.  A Fast Density-Grid Based Clustering Method , 2019, 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC).

[6]  Salwani Abdullah,et al.  Comparison between compactness and connectedness criteria in data clustering , 2016, Int. J. Data Anal. Tech. Strateg..

[7]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[8]  Teh Ying Wah,et al.  A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream , 2014, TheScientificWorldJournal.

[9]  Yong Shi,et al.  A Distributed Density-Grid Clustering Algorithm for Multi-Dimensional Data , 2020, 2020 10th Annual Computing and Communication Workshop and Conference (CCWC).

[10]  Myra Spiliopoulou,et al.  C-DenStream: Using Domain Knowledge on a Data Stream , 2009, Discovery Science.

[11]  Wan Haslina Hassan,et al.  Current research on Internet of Things (IoT) security: A survey , 2019, Comput. Networks.

[12]  Azuraliza Abu Bakar,et al.  IMPROVED DENSITY BASED ALGORITHM FOR DATA STREAM CLUSTERING , 2015 .

[13]  Michael Hahsler,et al.  Clustering Data Streams Based on Shared Density between Micro-Clusters , 2016, IEEE Transactions on Knowledge and Data Engineering.

[14]  Charu C. Aggarwal,et al.  The Internet of Things: A Survey from the Data-Centric Perspective , 2013, Managing and Mining Sensor Data.

[15]  Bo Wu,et al.  A Fast Density and Grid Based Clustering Method for Data With Arbitrary Shapes and Noise , 2017, IEEE Transactions on Industrial Informatics.

[16]  Plamen Angelov,et al.  Fully online clustering of evolving data streams into arbitrarily shaped clusters , 2017, Inf. Sci..

[17]  Himesh Parmar,et al.  Overview of Clustering Algorithm for Weather Data , 2017 .

[18]  Jianpei Zhang,et al.  Data Stream Clustering Algorithm Based on Active Grid Density , 2010, 2010 Fifth International Conference on Internet Computing for Science and Engineering.

[19]  Muhammad Yousefnezhad,et al.  Wisdom of Crowds cluster ensemble , 2016, Intell. Data Anal..

[20]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[21]  Ali Kamandi,et al.  SW-DBSCAN: A Grid-based DBSCAN Algorithm for Large Datasets , 2020, 2020 6th International Conference on Web Research (ICWR).

[22]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[23]  Philip S. Yu,et al.  A Survey of Synopsis Construction in Data Streams , 2007, Data Streams - Models and Algorithms.

[24]  Weiguo Liu,et al.  Clustering Algorithm for High Dimensional Data Stream over Sliding Windows , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[25]  Michal Daszykowski,et al.  Clustering in analytical chemistry. , 2014, Journal of AOAC International.

[26]  Bo Yuan,et al.  A Parallel GPU-Based Approach to Clustering Very Fast Data Streams , 2015, CIKM.

[27]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[28]  Nazlena Mohamad Ali,et al.  Optimal initial centroid in k-means for crime topic , 2012 .

[29]  Mustafa Abdullah Azzawi,et al.  A Review on Internet of Things ( IoT ) in Healthcare , 2016 .

[30]  Li Tu,et al.  Stream data clustering based on grid density and attraction , 2009, TKDD.

[31]  Chunyan Miao,et al.  REDPC: A residual error-based density peak clustering algorithm , 2019, Neurocomputing.

[32]  Haiqing Li,et al.  Clustering algorithm based on grid and density for data stream , 2017 .

[33]  Michael Hahsler,et al.  SOStream: Self Organizing Density-Based Clustering over Data Stream , 2012, MLDM.

[34]  D. Dimitrov Medical Internet of Things and Big Data in Healthcare , 2016, Healthcare informatics research.

[35]  Partha Pratim Ray,et al.  A survey on Internet of Things architectures , 2016, J. King Saud Univ. Comput. Inf. Sci..

[36]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[37]  Victor I. Chang,et al.  Privacy-preserving smart IoT-based healthcare big data storage and self-adaptive access control system , 2018, Inf. Sci..

[38]  Zalinda Othman,et al.  Comparative Analysis between K-Means and K-Medoids for Statistical Clustering , 2015, 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS).

[39]  Jameela Al-Jaroodi,et al.  Applications of big data to smart cities , 2015, Journal of Internet Services and Applications.

[40]  Shengxiang Yang,et al.  Dynamic Feature Selection for Clustering High Dimensional Data Streams , 2019, IEEE Access.

[41]  Jiadong Ren,et al.  Density-Based Data Streams Clustering over Sliding Windows , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[42]  Hamid Parvin,et al.  Consensus Function Based on Clusters Clustering and Iterative Fusion of Base Clusters , 2019, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[43]  Partha Pratim Ray A survey on Internet of Things architectures , 2018, J. King Saud Univ. Comput. Inf. Sci..

[44]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[45]  Li Liu,et al.  Dynamic Clustering Scheme for Evolving Data Streams Based on Improved STRAP , 2018, IEEE Access.

[46]  Hai Huang,et al.  A three-step clustering algorithm over an evolving data stream , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[47]  Giandomenico Spezzano,et al.  A single pass algorithm for clustering evolving data streams based on swarm intelligence , 2011, Data Mining and Knowledge Discovery.

[48]  Kim-Kwang Raymond Choo,et al.  Multimedia big data computing and Internet of Things applications: A taxonomy and process model , 2018, J. Netw. Comput. Appl..

[49]  Hamid Parvin,et al.  A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters , 2019, Applied intelligence (Boston).

[50]  Mohd Zakree Ahmad Nazri,et al.  A survey on clustering density based data stream algorithms , 2018 .

[51]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[52]  Rüdiger Westermann,et al.  Time-Hierarchical Clustering and Visualization of Weather Forecast Ensembles , 2017, IEEE Transactions on Visualization and Computer Graphics.

[53]  Dejan Dovzan,et al.  Large-scale cyber attacks monitoring using Evolving Cauchy Possibilistic Clustering , 2018, Appl. Soft Comput..

[54]  Jason J. Jung Semantic preprocessing for mining sensor streams from heterogeneous environments , 2011, Expert Syst. Appl..

[55]  P. Novianti,et al.  K-Means cluster analysis in earthquake epicenter clustering , 2017 .

[56]  Azuraliza Abu Bakar,et al.  Data stream clustering algorithms: A review , 2015, SOCO 2015.

[57]  Earthquake Cluster Analysis: K-Means Approach , 2016 .

[58]  Rui Zhou,et al.  An effective density-based clustering and dynamic maintenance framework for evolving medical data streams , 2019, Int. J. Medical Informatics.

[59]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[60]  Mohamed Bakhouya,et al.  On the use of IoT and Big Data Technologies for Real-time Monitoring and Data Processing , 2017, EUSPN/ICTH.

[61]  Mohsen Guizani,et al.  Deep Learning for IoT Big Data and Streaming Analytics: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[62]  Athanasios V. Vasilakos,et al.  The role of big data analytics in Internet of Things , 2017, Comput. Networks.

[63]  Junho Shim,et al.  Efficient Data Stream Clustering With Sliding Windows Based on Locality-Sensitive Hashing , 2018, IEEE Access.

[64]  Berat A. Erol,et al.  A Novel Streaming Data Clustering Algorithm Based on Fitness Proportionate Sharing , 2019, IEEE Access.

[65]  Jin-Yin Chen,et al.  A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data , 2016, Inf. Sci..

[66]  Mohamed Medhat Gaber,et al.  Density-Based Projected Clustering of Data Streams , 2012, SUM.

[67]  Christoph F. Eick,et al.  Discovery of Interesting Regions in Spatial Data Sets Using Supervised Clustering , 2006, PKDD.

[68]  Kamal Z. Zamli,et al.  A buffer-based online clustering for evolving data stream , 2019, Inf. Sci..

[69]  Azuraliza Abu Bakar,et al.  Optimization overlap clustering based on the hybrid rough discernibility concept and rough K-Means , 2015, Intell. Data Anal..

[70]  Changqing Yan,et al.  An arbitrary shape clustering algorithm over variable density data streams , 2017 .

[71]  Ling Chen,et al.  A clustering algorithm for multiple data streams based on spectral component similarity , 2012, Inf. Sci..

[72]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[73]  Wei Pang,et al.  FREDPC: A Feasible Residual Error-Based Density Peak Clustering Algorithm With the Fragment Merging Strategy , 2019, IEEE Access.

[74]  Sohail Asghar,et al.  Critical analysis of DBSCAN variations , 2010, 2010 International Conference on Information and Emerging Technologies.

[75]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[76]  Jaime Lloret,et al.  Internet of things: where to be is to trust , 2012, EURASIP J. Wirel. Commun. Netw..

[77]  Plamen Angelov,et al.  A new online clustering approach for data in arbitrary shaped clusters , 2015, 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF).

[78]  Mouzhi Ge,et al.  Big Data for Internet of Things: A Survey , 2018, Future Gener. Comput. Syst..

[79]  Shengxiang Yang,et al.  Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams , 2019, IEEE Transactions on Cybernetics.

[80]  Masnizah Mohd,et al.  Comparative Study of k-means and k-Means++ Clustering Algorithms on Crime Domain , 2014, J. Comput. Sci..

[81]  Ayoub Ait Lahcen,et al.  Big Data technologies: A survey , 2017, J. King Saud Univ. Comput. Inf. Sci..