A Novel Streaming Data Clustering Algorithm Based on Fitness Proportionate Sharing

As an unsupervised learning technique, clustering can effectively capture the patterns in a data stream based on similarities among the data. Traditional data stream clustering algorithms either heavily depend on some prior knowledge or predefined parameters while the characteristics of real-time data are considered unknown. Besides, the user-specified threshold is used to overcome the effect of outliers and noises, which significantly affects the clustering performance. The overlap among clusters is another major challenge for the existing stream clustering methods. These constraints strongly limit their real-time applications. In this paper, a two-phase stream clustering algorithm based on fitness proportionate sharing is proposed. It handles streaming data when prior knowledge is not available and maps the clustering problem into a multimodal optimization problem. It introduces a density-based objective function and adopts the fitness proportionate sharing strategy to perform a more effective search for the cluster centers. To capture the dynamic characteristics of streaming data, a recursive formula for the lower bound of the density function is derived, and a summary of historical data is established for the proposed algorithm. The proposed algorithm is applied to different data sets, and a comprehensive comparison between the proposed algorithm and five well-known data stream clustering algorithms in the literature is provided. Results show comparable or better performance relative to five popular data stream clustering algorithms. A scalability analysis of the proposed streaming clustering method is presented in this paper as well.

[1]  Mohammad Razeghi-Jahromi,et al.  A novel clustering algorithm based on fitness proportionate sharing , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[2]  Michael Hahsler,et al.  Clustering Data Streams Based on Shared Density between Micro-Clusters , 2016, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ira Assent,et al.  The ClusTree: indexing micro-clusters for anytime stream mining , 2011, Knowledge and Information Systems.

[4]  Plamen P. Angelov,et al.  Autonomous data-driven clustering for live data stream , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5]  Teh Ying Wah,et al.  LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream , 2013 .

[6]  Ge Yu,et al.  Clustering Stream Data by Exploring the Evolution of Density Mountain , 2017, Proc. VLDB Endow..

[7]  Lu Chen,et al.  A method for discovering clusters of e-commerce interest patterns using click-stream data , 2015, Electron. Commer. Res. Appl..

[8]  Sabah Jassim,et al.  EDDS: An Enhanced Density-Based Method for Clustering Data Streams , 2017, 2017 46th International Conference on Parallel Processing Workshops (ICPPW).

[9]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[10]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[11]  Rahul Ramachandran,et al.  Real-time storm detection and weather forecast activation through data mining and events processing , 2008, Earth Sci. Informatics.

[12]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[13]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[14]  Xiang Lin,et al.  DOE-AND-SCA: A Novel SCA Based on DNN With Optimal Eigenvectors and Automatic Cluster Number Determination , 2018, IEEE Access.

[15]  Chidchanok Lursinsap,et al.  BEstream: Batch Capturing with Elliptic Function for One-Pass Data Stream Clustering , 2018, Data Knowl. Eng..

[16]  K. Warwick,et al.  Dynamic Niche Clustering: a fuzzy variable radius niching technique for multimodal optimisation in GAs , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[17]  Mohamed Medhat Gaber,et al.  Learning from Data Streams: Processing Techniques in Sensor Networks , 2007 .

[18]  D.P. Filev,et al.  An approach to online identification of Takagi-Sugeno fuzzy models , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  L. Hubert,et al.  Comparing partitions , 1985 .

[20]  Ying Wah Teh,et al.  MuDi-Stream: A multi density clustering algorithm for evolving data stream , 2016, J. Netw. Comput. Appl..

[21]  Jiadong Ren,et al.  Density-Based Data Streams Clustering over Sliding Windows , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[22]  John R. Williams,et al.  Data-Stream-Based Intrusion Detection System for Advanced Metering Infrastructure in Smart Grid: A Feasibility Study , 2015, IEEE Systems Journal.

[23]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[24]  Parikshit N. Mahalle,et al.  Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion , 2018, Big Data Cogn. Comput..

[25]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[26]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[27]  Miin-Shen Yang,et al.  A similarity-based robust clustering method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  J. Kurths,et al.  Taming instabilities in power grid networks by decentralized control , 2015, 1508.02217.

[29]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[30]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[31]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[32]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[33]  Matthijs J. Warrens,et al.  On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index , 2008, J. Classif..

[34]  Jin-Yin Chen,et al.  A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data , 2016, Inf. Sci..

[35]  Sharma Chakravarthy,et al.  Clustering data streams using grid-based synopsis , 2013, Knowledge and Information Systems.

[36]  Suphakant Phimoltares,et al.  Hyper-cylindrical micro-clustering for streaming data with unscheduled data removals , 2016, Knowl. Based Syst..

[37]  Kuo-Lung Wu,et al.  Mean shift-based clustering , 2007, Pattern Recognit..

[38]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[39]  Sumeet Dua,et al.  Data Mining and Machine Learning in Cybersecurity , 2011 .

[40]  Yao Zhao,et al.  A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentation , 2016, Pattern Recognit..

[41]  Nikos Pelekis,et al.  An evaluation of data stream clustering algorithms , 2018, Stat. Anal. Data Min..

[42]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[43]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[44]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[45]  A. Homaifar,et al.  A Fitness Proportionate Reward Sharing : a Viable Default Hierarchy Formation Strategy in LCS , 2012 .

[46]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[47]  Dimitris K. Tasoulis,et al.  Unsupervised Clustering In Streaming Data , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[48]  R. Yager,et al.  Approximate Clustering Via the Mountain Method , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[49]  Teh Ying Wah,et al.  DENGRIS-Stream: A Density-Grid based Clustering Algorithm for Evolving Data Streams over Sliding Window , 2012 .

[50]  Chen Jia,et al.  A Grid and Density-Based Clustering Algorithm for Processing Data Stream , 2008, 2008 Second International Conference on Genetic and Evolutionary Computing.

[51]  Plamen Angelov,et al.  Autonomous Learning Systems: From Data Streams to Knowledge in Real-time , 2013 .

[52]  Xianda Zhang,et al.  A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem , 2010, Pattern Recognit..

[53]  Hans-Peter Kriegel,et al.  Density-based Projected Clustering over High Dimensional Data Streams , 2012, SDM.

[54]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[55]  Lotfi A. Zadeh,et al.  Similarity relations and fuzzy orderings , 1971, Inf. Sci..

[56]  Matthias Jarke,et al.  An evaluation framework for traffic information systems based on data streams , 2012 .