Hashing for Adaptive Real-Time Graph Stream Classification With Concept Drifts

Many applications involve processing networked streaming data in a timely manner. Graph stream classification aims to learn a classification model from a stream of graphs with only one-pass of data, requiring real-time processing in training and prediction. This is a nontrivial task, as many existing methods require multipass of the graph stream to extract subgraph structures as features for graph classification which does not simultaneously satisfy “one-pass” and “real-time” requirements. In this paper, we propose an adaptive real-time graph stream classification method to address this challenge. We partition the unbounded graph stream data into consecutive graph chunks, each consisting of a fixed number of graphs and delivering a corresponding chunk-level classifier. We employ a random hashing function to compress the original node set of graphs in each chunk for fast feature detection when training chunk-level classifiers. Furthermore, a differential hashing strategy is applied to map unlimited increasing features (i.e., cliques) into a fixed-size feature space which is then used as a feature vector for stochastic learning. Finally, the chunk-level classifiers are weighted in an ensemble learning model for graph classification. The proposed method substantially speeds up the graph feature extraction and avoids unbounded graph feature growth. Moreover, it effectively offsets concept drifts in graph stream classification. Experiments on real-world and synthetic graph streams demonstrate that our method significantly outperforms existing methods in both classification accuracy and learning efficiency.

[1]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[3]  Chengqi Zhang,et al.  Graph Ensemble Boosting for Imbalanced Noisy Graph Stream Classification , 2015, IEEE Transactions on Cybernetics.

[4]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[5]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[6]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[9]  Shirish Tatikonda,et al.  Hashing tree-structured data: Methods and applications , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[10]  Philip S. Yu,et al.  Graph stream classification using labeled and unlabeled graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[12]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[13]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[14]  Bhasker Pant,et al.  Opinion extraction and classification of real time Facebook Status , 2012 .

[15]  Bin Li,et al.  Fast Graph Stream Classification Using Discriminative Clique Hashing , 2013, PAKDD.

[16]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[17]  Charu C. Aggarwal,et al.  On Classification of Graph Streams , 2011, SDM.

[18]  Dariusz Brzezinski,et al.  Structural XML Classification in Concept Drifting Data Streams , 2015, New Generation Computing.

[19]  Bin Li,et al.  Context-Preserving Hashing for Fast Text Classification , 2014, SDM.

[20]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[21]  Xuelong Li,et al.  Large-Scale Unsupervised Hashing with Shared Structure Learning , 2015, IEEE Transactions on Cybernetics.

[22]  Sreenivas Gollapudi,et al.  The power of two min-hashes for similarity search among hierarchical data objects , 2008, PODS.

[23]  Lei Chen,et al.  Continuous Subgraph Pattern Search over Certain and Uncertain Graph Streams , 2010, IEEE Transactions on Knowledge and Data Engineering.

[24]  Chengqi Zhang,et al.  Nested Subtree Hash Kernels for Large-Scale Graph Classification over Streams , 2012, 2012 IEEE 12th International Conference on Data Mining.

[25]  Joan Feigenbaum,et al.  Graph Distances in the Data-Stream Model , 2008, SIAM J. Comput..

[26]  Philip S. Yu,et al.  On Clustering Graph Streams , 2010, SDM.

[27]  Ricard Gavaldà,et al.  Adaptive XML Tree Classification on Evolving Data Streams , 2009, ECML/PKDD.

[28]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.

[29]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[30]  Jean-Philippe Vert,et al.  Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[31]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[32]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[33]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[34]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[35]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[36]  László Lovász,et al.  Approximating clique is almost NP-complete , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.