Approximate Evaluation of Label-Constrained Reachability Queries

The current surge of interest in graph-based data models mirrors the usage of increasingly complex reachability queries, as witnessed by recent analytical studies on real-world graph query logs. Despite the maturity of graph DBMS capabilities, complex label-constrained reachability queries, along with their corresponding aggregate versions, remain difficult to evaluate. In this paper, we focus on the approximate evaluation of counting label-constrained reachability queries. We offer a human-explainable solution to graph Approximate Query Processing (AQP). This consists of a summarization algorithm (GRASP), as well as of a custom visualization plug-in, which allows users to explore the obtained summaries. We prove that the problem of node group minimization, associated to the creation of GRASP summaries, is NP-complete. Nonetheless, our GRASP summaries are reasonably small in practice, even for large graph instances, and guarantee approximate graph query answering, paired with controllable error estimates. We experimentally gauge the scalability and efficiency of our GRASP algorithm, and verify the accuracy and error estimation of the graph AQP module. To the best of our knowledge, ours is the first system capable of handling visualization-driven approximate graph analytics for complex label-constrained reachability queries.

[1]  Danai Koutra,et al.  Graph Summarization Methods and Applications , 2016, ACM Comput. Surv..

[2]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[3]  Juan Sequeda,et al.  G-CORE: A Core for Future Graph Query Languages , 2017, SIGMOD Conference.

[4]  Ben Shneiderman,et al.  Designing Semantic Substrates for Visual Network Exploration , 2007, Inf. Vis..

[5]  Peter T. Wood,et al.  Query languages for graph databases , 2012, SGMD.

[6]  Viswanath Poosala,et al.  Congressional samples for approximate answering of group-by queries , 2000, SIGMOD '00.

[7]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[8]  Yang Xiang,et al.  Computing label-constraint reachability in graph databases , 2010, SIGMOD Conference.

[9]  Wim Martens,et al.  An analytical study of large SPARQL query logs , 2017, VLDB 2017.

[10]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[11]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[12]  Marcelo Arenas,et al.  Foundations of Modern Query Languages for Graph Databases , 2016, ACM Comput. Surv..

[13]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[14]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[15]  Qing Chen,et al.  Graph Stream Summarization: From Big Bang to Big Crunch , 2016, SIGMOD Conference.

[16]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[17]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[18]  Chris Jermaine,et al.  Scalable approximate query processing with the DBO engine , 2007, SIGMOD '07.

[19]  L. Toledo-Pereyra Trust , 2006, Mediation Behaviour.

[20]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[21]  Sourav S. Bhowmick,et al.  Summarizing Static and Dynamic Big Graphs , 2017, Proc. VLDB Endow..

[22]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[23]  Carsten Binnig,et al.  Revisiting Reuse for Approximate Query Processing , 2017, Proc. VLDB Endow..

[24]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[25]  Ion Stoica,et al.  G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data , 2015, SIGMOD Conference.

[26]  Scott Shenker,et al.  Bridging the GAP: towards approximate graph analytics , 2018, GRADES/NDA@SIGMOD/PODS.

[27]  Sungpack Hong,et al.  PGQL: a property graph query language , 2016, GRADES '16.

[28]  Renzo Angles,et al.  The Property Graph Database Model , 2018, AMW.

[29]  Peter A. Boncz,et al.  An early look at the LDBC social network benchmark's business intelligence workload , 2018, GRADES/NDA@SIGMOD/PODS.

[30]  Jignesh M. Patel,et al.  Discovery-driven graph summarization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[31]  Jan van Leeuwen,et al.  Maintenance of Transitive Closures and Transitive Reductions of Graphs , 1987, WG.

[32]  Yinghui Wu,et al.  Summarizing Answer Graphs Induced by Keyword Queries , 2013, Proc. VLDB Endow..

[33]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[34]  Jean-Daniel Fekete,et al.  Overlaying Graph Links on Treemaps , 2003 .

[35]  Jian Pei,et al.  AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics , 2018, SIGMOD Conference.

[36]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[37]  George H. L. Fletcher,et al.  Querying Graphs , 2018, Querying Graphs.

[38]  Bolin Ding,et al.  Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data , 2017, CHI.

[39]  Wolfgang Lehner,et al.  SynopSys: Foundations for Multidimensional Graph Analytics , 2014, BIRTE.

[40]  Boris Motik,et al.  Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation , 2018, WWW.

[41]  George H. L. Fletcher,et al.  gMark: Schema-Driven Generation of Graphs and Queries , 2015, IEEE Transactions on Knowledge and Data Engineering.

[42]  Ion Stoica,et al.  ZipG: A Memory-efficient Graph Store for Interactive Queries , 2017, SIGMOD Conference.

[43]  Markus Krötzsch,et al.  Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph , 2018, SEMWEB.

[44]  Srikanth Kandula,et al.  Approximate Query Processing: No Silver Bullet , 2017, SIGMOD Conference.