Labeled graph sketches: Keeping up with real-time graph streams

Abstract Currently, graphs serve as fundamental data structures for many applications, such as road networks, social and communication networks, and web requests. In many applications, graph edges stream in and users are only interested in the recent data. In data exploration, the storage and processing of such massive amounts of graph stream data has become a significant problem. As the categorical attributes of vertices and edges are often referred to as labels, we propose a labeled graph sketch that stores real-time graph structural information using only sublinear space and that supports graph queries of diverse types. This sketch also works for sliding-window queries. We conduct extensive experiments on real-world datasets in six different domains and compare the results with a state-of-the-art method to show the accuracy, efficiency, and practicability of our proposed approach.

[1]  Danai Koutra,et al.  TimeCrunch: Interpretable Dynamic Graph Summarization , 2015, KDD.

[2]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[3]  Charu C. Aggarwal,et al.  gSketch: On Query Estimation in Graph Streams , 2011, Proc. VLDB Endow..

[4]  Sudipto Guha,et al.  Spectral Sparsification in Dynamic Graph Streams , 2013, APPROX-RANDOM.

[5]  Bruce M. Kapron,et al.  Dynamic graph connectivity in polylogarithmic worst case time , 2013, SODA.

[6]  Yang Xiang,et al.  Computing label-constraint reachability in graph databases , 2010, SIGMOD Conference.

[7]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[8]  Lei Shi,et al.  TOPIC: Toward perfect Influence Graph Summarization , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[9]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.

[10]  Lei Chen,et al.  Distance-Aware Selective Online Query Processing Over Large Distributed Graphs , 2016, Data Science and Engineering.

[11]  Ilkka Norros,et al.  Regular Decomposition of Large Graphs: Foundation of a Sampling Approach to Stochastic Block Model Fitting , 2019, Data Science and Engineering.

[12]  James Cheng,et al.  TF-Label: a topological-folding labeling scheme for reachability querying in a large graph , 2013, SIGMOD '13.

[13]  Loïc Cerf,et al.  Reachability Queries in Very Large Graphs: A Fast Refined Online Search Approach , 2014, EDBT.

[14]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[15]  Chunming Hu,et al.  Big Graph Analyses: From Queries to Dependencies and Association Rules , 2017, Data Science and Engineering.

[16]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[17]  Yongsub Lim,et al.  MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2015, KDD.

[18]  Jeffrey Xu Yu,et al.  Top-k graph pattern matching over large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[19]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[20]  Junhu Wang,et al.  Multi-Query Optimization for Subgraph Isomorphism Search , 2016, Proc. VLDB Endow..

[21]  François Goasdoué,et al.  Query-Oriented Summarization of RDF Graphs , 2015, Proc. VLDB Endow..

[22]  Ryan A. Rossi,et al.  On Sampling from Massive Graph Streams , 2017, Proc. VLDB Endow..

[23]  O. Frank Sampling and estimation in large social networks , 1978 .

[24]  Peter Triantafillou,et al.  Indexing Query Graphs to Speedup Graph Query Processing , 2016, EDBT.

[25]  Arnab Bhattacharya,et al.  Neighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics , 2017, WWW.

[26]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[27]  Xin Wang,et al.  Diversified Top-k Graph Pattern Matching , 2013, Proc. VLDB Endow..

[28]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[29]  Qing Chen,et al.  Graph Stream Summarization: From Big Bang to Big Crunch , 2016, SIGMOD Conference.

[30]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[31]  Zhengwei Yang,et al.  Diversified Top-k Subgraph Querying in a Large Graph , 2016, SIGMOD Conference.

[32]  Jiawei Han,et al.  Top-K interesting subgraph discovery in information networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[33]  Yongsub Lim,et al.  Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2018, ACM Trans. Knowl. Discov. Data.

[34]  Sibo Wang,et al.  Reachability queries on large dynamic graphs: a total order approach , 2014, SIGMOD Conference.

[35]  Sebastian Maneth,et al.  Compressing graphs by grammars , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[36]  Ion Stoica,et al.  ZipG: A Memory-efficient Graph Store for Interactive Queries , 2017, SIGMOD Conference.

[37]  Tingjian Ge,et al.  Labeled Graph Sketches , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[38]  Charu C. Aggarwal,et al.  Query-friendly compression of graph streams , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[39]  Jie Wang,et al.  Event Pattern Matching over Graph Streams , 2014, Proc. VLDB Endow..