MVStream: Multiview Data Stream Clustering

This article studies a new problem of data stream clustering, namely, multiview data stream (MVStream) clustering. Although many data stream clustering algorithms have been developed, they are restricted to the single-view streaming data, and clustering MVStreams still remains largely unsolved. In addition to the many issues encountered by the conventional single-view data stream clustering, such as capturing cluster evolution and discovering clusters of arbitrary shapes under the limited computational resources, the main challenge of MVStream clustering lies in integrating information from multiple views in a streaming manner and abstracting summary statistics from the integrated features simultaneously. In this article, we propose a novel MVStream clustering algorithm for the first time. The main idea is to design a multiview support vector domain description (MVSVDD) model, by which the information from multiple insufficient views can be integrated, and the outputting support vectors (SVs) are utilized to abstract the summary statistics of the historical multiview data objects. Based on the MVSVDD model, a new multiview cluster labeling method is designed, whereby clusters of arbitrary shapes can be discovered for each view. By tracking the cluster labels of SVs in each view, the cluster evolution associated with concept drift can be captured. Since the SVs occupy only a small portion of data objects, the proposed MVStream algorithm is quite efficient with the limited computational resources. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed method.

[1]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Piotr Duda,et al.  A New Method for Data Stream Mining Based on the Misclassification Error , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Chang-Dong Wang,et al.  A Harmonic Motif Modularity Approach for Multi-layer Network Community Detection , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[5]  Bin Li,et al.  Hashing for Adaptive Real-Time Graph Stream Classification With Concept Drifts , 2018, IEEE Transactions on Cybernetics.

[6]  Ioannis Katsavounidis,et al.  Low-Complexity Hand Gesture Recognition System for Continuous Streams of Digits and Letters , 2016, IEEE Transactions on Cybernetics.

[7]  Yi Yang,et al.  A Framework of Online Learning with Imbalanced Streaming Data , 2017, AAAI.

[8]  Bidyut Baran Chaudhuri,et al.  Handling data irregularities in classification: Foundations, trends, and future challenges , 2018, Pattern Recognit..

[9]  Swagatam Das,et al.  Clustering with missing features: a penalized dissimilarity measure based approach , 2016, Machine Learning.

[10]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Motoaki Kawanabe,et al.  Generating an Event Timeline About Daily Activities From a Semantic Concept Stream , 2018, AAAI.

[12]  Lin Wu,et al.  Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Xuelong Li,et al.  Multiview Clustering via Adaptively Weighted Procrustes , 2018, KDD.

[14]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[16]  Xindong Wu,et al.  Learning From Short Text Streams With Topic Drifts , 2018, IEEE Transactions on Cybernetics.

[17]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[18]  Piotr Duda,et al.  New Splitting Criteria for Decision Trees in Stationary Data Streams , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Xuelong Li,et al.  Self-weighted Multiview Clustering with Multiple Graphs , 2017, IJCAI.

[20]  Yu Zhang,et al.  Streaming k-Means Clustering with Fast Queries , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[21]  Chenping Hou,et al.  Reliable Multi-View Clustering , 2018, AAAI.

[22]  Chengqi Zhang,et al.  Graph Ensemble Boosting for Imbalanced Noisy Graph Stream Classification , 2015, IEEE Transactions on Cybernetics.

[23]  Jesse Davis,et al.  Predicting Soccer Highlights from Spatio-Temporal Match Event Streams , 2017, AAAI.

[24]  Ge Yu,et al.  Clustering Stream Data by Exploring the Evolution of Density Mountain , 2017, Proc. VLDB Endow..

[25]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[26]  Chang-Dong Wang,et al.  Weighted Multi-view Clustering with Feature Selection , 2016, Pattern Recognit..

[27]  Yinghuan Shi,et al.  MRM-Lasso: A Sparse Multiview Feature Selection Method via Low-Rank Analysis , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Kun Zhan,et al.  Graph Learning for Multiview Clustering , 2018, IEEE Transactions on Cybernetics.

[29]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[30]  Mohammad Taghi Hajiaghayi,et al.  Market Pricing for Data Streams , 2017, AAAI.

[31]  Philip S. Yu,et al.  Multi-View Clustering Based on Belief Propagation , 2016, IEEE Transactions on Knowledge and Data Engineering.

[32]  Xuelong Li,et al.  Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours , 2017, AAAI.

[33]  Chang-Dong Wang,et al.  Position regularized Support Vector Domain Description , 2013, Pattern Recognit..

[34]  Zhengming Ding,et al.  Robust Multiview Data Analysis Through Collective Low-Rank Subspace , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Shengxiang Yang,et al.  Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams , 2019, IEEE Transactions on Cybernetics.

[36]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[37]  Mihai Lazarescu,et al.  Incremental clustering of dynamic data streams using connectivity based representative points , 2009, Data Knowl. Eng..

[38]  Moamar Sayed Mouchaweh,et al.  A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams , 2018, IEEE Trans. Neural Networks Learn. Syst..

[39]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[40]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[41]  Hao Huang,et al.  Streaming spectral clustering , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[42]  Charu C. Aggarwal,et al.  A Survey of Stream Clustering Algorithms , 2018, Data Clustering: Algorithms and Applications.

[43]  Qing Wu,et al.  AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Philip S. Yu,et al.  Dynamic Community Detection in Weighted Graph Streams , 2013, SDM.

[45]  Athanasios V. Vasilakos,et al.  Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data , 2016, IEEE Transactions on Services Computing.

[46]  Latifur Khan,et al.  SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream , 2016, AAAI.

[47]  Jiebo Luo,et al.  Fast Online Incremental Learning on Mixture Streaming Data , 2017, AAAI.

[48]  Jin-Yin Chen,et al.  A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data , 2016, Inf. Sci..

[49]  Hong Yu,et al.  Weighted Multi-View Spectral Clustering Based on Spectral Perturbation , 2018, AAAI.

[50]  Jingjing Tang,et al.  Multiview Privileged Support Vector Machines , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Chang-Dong Wang,et al.  Overlapping Community Detection in Multi-view Brain Network , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[52]  Lin Wu,et al.  Unsupervised Metric Fusion Over Multiview Data by Graph Random Walk-Based Cross-View Diffusion , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Charu C. Aggarwal,et al.  Efficient handling of concept drift and concept evolution over Stream Data , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[54]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[55]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[56]  Chang-Dong Wang,et al.  Multi-view Proximity Learning for Clustering , 2018, DASFAA.

[57]  Haiquan Zhao,et al.  Distributed Online One-Class Support Vector Machine for Anomaly Detection Over Networks , 2019, IEEE Transactions on Cybernetics.

[58]  Chang-Dong Wang,et al.  SVStream: A Support Vector-Based Algorithm for Clustering Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[59]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[60]  Kurt Driessens,et al.  Adapting to Concept Drift in Credit Card Transaction Data Streams Using Contextual Bandits and Decision Trees , 2018, AAAI.

[61]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[62]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[63]  Wei Zhang,et al.  Consistent and Specific Multi-View Subspace Clustering , 2018, AAAI.

[64]  Chang-Dong Wang,et al.  Higher-Order Multi-Layer Community Detection , 2019, AAAI.

[65]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[66]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[67]  Chang-Dong Wang,et al.  Multi-view Intact Space Clustering , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[68]  Shangsong Liang,et al.  Dynamic User Profiling for Streams of Short Texts , 2018, AAAI.