An Approach For Concept Drift Detection in a Graph Stream Using Discriminative Subgraphs

The emergence of mining complex networks like social media, sensor networks, and the world-wide-web has attracted considerable research interest. In a streaming scenario, the concept to be learned can change over time. However, while there has been some research done for detecting concept drift in traditional data streams, little work has been done on addressing concept drift in data represented as a graph. We propose a novel unsupervised concept-drift detection method on graph streams called Discriminative Subgraph-based Drift Detector (DSDD). The methodology starts by discovering discriminative subgraphs for each graph in the stream. We then compute the entropy of the window based on the distribution of discriminative subgraphs with respect to the graphs and then use the direct density-ratio estimation approach for detecting concept drift in the series of entropy values obtained by moving one step forward in the sliding window. The effectiveness of the proposed method is demonstrated through experiments using artificial and real-world datasets and its performance is evaluated by comparing against related baseline methods. Similarly, the usefulness of the proposed concept drift detection approach is studied by incorporating it in a popular graph stream classification algorithm and studying the impact of drift detection in classification accuracy.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Bin Li,et al.  Hashing for Adaptive Real-Time Graph Stream Classification With Concept Drifts , 2018, IEEE Transactions on Cybernetics.

[3]  Lorenzo Livi,et al.  Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[4]  Lei Chen,et al.  Continuous Subgraph Pattern Search over Certain and Uncertain Graph Streams , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  William H. Woodall,et al.  Modeling and Detecting Change in Temporal Networks via a Dynamic Degree Corrected Stochastic Block Model , 2016 .

[6]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[7]  Matthias Dehmer,et al.  A history of graph entropy measures , 2011, Inf. Sci..

[8]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[9]  Kenneth O. Stanley Learning Concept Drift with a Committee of Decision Trees , 2003 .

[10]  Philip S. Yu,et al.  On Clustering Graph Streams , 2010, SDM.

[11]  Herna L. Viktor,et al.  Fast Hoeffding Drift Detection Method for Evolving Data Streams , 2016, ECML/PKDD.

[12]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[13]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[14]  Roberto Souto Maior de Barros,et al.  RCD: A recurring concept drift framework , 2013, Pattern Recognit. Lett..

[15]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[16]  Charu C. Aggarwal,et al.  Graph Data Management and Mining: A Survey of Algorithms and Applications , 2010, Managing and Mining Graph Data.

[17]  Diane J. Cook,et al.  A survey of methods for time series change point detection , 2017, Knowledge and Information Systems.

[18]  Jukka-Pekka Onnela,et al.  Change Point Detection in Correlation Networks , 2014, Scientific Reports.

[19]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[20]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[21]  Mehmed M. Kantardzic,et al.  On the reliable detection of concept drift from streaming unlabeled data , 2017, Expert Syst. Appl..

[22]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  Gerhard Widmer,et al.  Learning Flexible Concepts from Streams of Examples: FLORA 2 , 1992, ECAI.

[24]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[25]  Chengqi Zhang,et al.  Graph Ensemble Boosting for Imbalanced Noisy Graph Stream Classification , 2015, IEEE Transactions on Cybernetics.

[26]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[27]  Francesco Piazza,et al.  Online sequential extreme learning machine in nonstationary environments , 2013, Neurocomputing.

[28]  Lorenzo Livi,et al.  Concept Drift and Anomaly Detection in Graph Streams , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[30]  Roberto Souto Maior de Barros,et al.  A large-scale comparison of concept drift detectors , 2018, Inf. Sci..

[31]  Nigel Collier,et al.  Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation , 2012, Neural Networks.

[32]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[33]  José del Campo-Ávila,et al.  Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds , 2015, IEEE Transactions on Knowledge and Data Engineering.

[34]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[35]  Lawrence B. Holder,et al.  Scalable Discovery of Informative Structural Concepts Using Domain Knowledge , 1996, IEEE Expert.

[36]  Cesare Alippi,et al.  Just-In-Time Classifiers for Recurrent Concepts , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[38]  J CookDiane,et al.  Substructure discovery using minimum description length and background knowledge , 1994 .

[39]  Miroslav Kubat Floating approximation in time-varying knowledge bases , 1989, Pattern Recognit. Lett..

[40]  Kaspar Riesen,et al.  IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning , 2008, SSPR/SPR.

[41]  Cesare Alippi,et al.  Just-in-time Adaptive Classifiers in Non-Stationary Conditions , 2007, 2007 International Joint Conference on Neural Networks.

[42]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[43]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[44]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[45]  Lawrence B. Holder,et al.  Mining Graph Data: Cook/Mining Graph Data , 2006 .

[46]  Roberto Souto Maior de Barros,et al.  RDDM: Reactive drift detection method , 2017, Expert Syst. Appl..

[47]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[48]  KawaharaYoshinobu,et al.  Sequential change-point detection based on direct density-ratio estimation , 2012 .

[49]  Takafumi Kanamori,et al.  Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011, Neural Computation.

[50]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[51]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[52]  Manoranjan Dash,et al.  A Test Paradigm for Detecting Changes in Transactional Data Streams , 2008, DASFAA.

[53]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[54]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Fast adaptive stacking of ensembles , 2016, SAC.

[55]  Lawrence B. Holder,et al.  Scalable SVM-Based Classification in Dynamic Graphs , 2014, 2014 IEEE International Conference on Data Mining.

[56]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[57]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[58]  Edwin Lughofer,et al.  Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances , 2016, Inf. Sci..

[59]  Yves Deville,et al.  Relevant subgraph extraction from random walks in a graph , 2006 .

[60]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[61]  Roberto Souto Maior de Barros,et al.  A Boosting-like Online Learning Ensemble , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[62]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[63]  Cesare Alippi,et al.  Just in time classifiers: Managing the slow drift case , 2009, 2009 International Joint Conference on Neural Networks.

[64]  Philip S. Yu,et al.  Graph stream classification using labeled and unlabeled graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[65]  Bin Li,et al.  Fast Graph Stream Classification Using Discriminative Clique Hashing , 2013, PAKDD.

[66]  Yun Sing Koh,et al.  Detecting concept change in dynamic data streams , 2013, Machine Learning.

[67]  J. C. Schlimmer,et al.  Incremental learning from noisy data , 2004, Machine Learning.

[68]  Graham J. Williams,et al.  Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives [Discussion Forum] , 2014, IEEE Computational Intelligence Magazine.

[69]  Chengqi Zhang,et al.  Nested Subtree Hash Kernels for Large-Scale Graph Classification over Streams , 2012, 2012 IEEE 12th International Conference on Data Mining.

[70]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[71]  Masashi Sugiyama,et al.  Sequential change‐point detection based on direct density‐ratio estimation , 2012, Stat. Anal. Data Min..

[72]  Lawrence B. Holder,et al.  Detecting Concept Drift in Classification Over Streaming Graphs , 2016 .

[73]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[74]  Roberto Souto Maior de Barros,et al.  Wilcoxon Rank Sum Test Drift Detector , 2018, Neurocomputing.

[75]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[76]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[77]  William Eberle,et al.  Detecting the Onset of a Network Layer DoS Attack with a Graph-Based Approach , 2019, FLAIRS Conference.

[78]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[79]  Cesare Alippi,et al.  Just-in-Time Adaptive Classifiers—Part II: Designing the Classifier , 2008, IEEE Transactions on Neural Networks.

[80]  Michaela M. Black,et al.  Learning classification rules for telecom customer call data under concept drift , 2003, Soft Comput..

[81]  KlinkenbergRalf Learning drifting concepts: Example selection vs. example weighting , 2004 .

[82]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[83]  Charu C. Aggarwal,et al.  On Classification of Graph Streams , 2011, SDM.

[84]  A. John MINING GRAPH DATA , 2022 .

[85]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[86]  Roberto Souto Maior de Barros,et al.  Speeding Up Recovery from Concept Drifts , 2014, ECML/PKDD.

[87]  Abraham Bernstein,et al.  Entropy-based Concept Shift Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[88]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[89]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .