Kaskade: Graph Views for Efficient Graph Analytics

Graphs are a natural way to model real-world entities and relationships between them, ranging from social networks to data lineage graphs and biological datasets. Queries over these large graphs often involve expensive sub-graph traversals and complex analytical computations. These real-world graphs are often substantially more structured than a generic vertex-and-edge model would suggest, but this insight has remained mostly unexplored by existing graph engines for graph query optimization purposes. In this work, we leverage structural properties of graphs and queries to automatically derive materialized graph views that can dramatically speed up query evaluation. We present Kaskade, the first graph query optimization framework to exploit materialized graph views for query optimization purposes. Kaskade employs a novel constraint-based view enumeration technique that mines constraints from query workloads and graph schemas, and injects them during view enumeration to significantly reduce the search space of views to be considered. Moreover, it introduces a graph view size estimator to pick the most beneficial views to materialize given a query set and to select the best query evaluation plan given a set of materialized views. We evaluate its performance over real-world graphs, including the provenance graph that we maintain at Microsoft to enable auditing, service analytics, and advanced system optimizations. Our results show that Kaskade substantially reduces the effective graph size and yields significant performance speedups (up to 50X), in some cases making otherwise intractable queries possible.

[1]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[2]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[3]  Jiawei Han,et al.  Mining Graph Patterns Efficiently via Randomized Summaries , 2009, Proc. VLDB Endow..

[4]  Frank van Harmelen,et al.  A semantic web primer , 2004 .

[5]  Yuanyuan Tian,et al.  Big Graph Analytics Platforms , 2017, Found. Trends Databases.

[6]  Yue Zhuge,et al.  Graph structured views and their incremental maintenance , 1998, Proceedings 14th International Conference on Data Engineering.

[7]  Danai Koutra,et al.  Graph Summarization Methods and Applications: A Survey , 2016 .

[8]  Alin Deutsch,et al.  Rewriting nested XML queries using nested views , 2006, SIGMOD Conference.

[9]  Stefan Plantikow,et al.  Cypher: An Evolving Query Language for Property Graphs , 2018, SIGMOD Conference.

[10]  Reynold Xin,et al.  GraphFrames: an integrated API for mixing graph and relational queries , 2016, GRADES '16.

[11]  Wenfei Fan,et al.  Parallel Reasoning of Graph Functional Dependencies , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[12]  Ioana Manolescu,et al.  Materialized view selection for XQuery workloads , 2012, SIGMOD Conference.

[13]  Alon Y. Halevy,et al.  Goods: Organizing Google's Datasets , 2016, SIGMOD Conference.

[14]  Daniel J. Abadi,et al.  Scalable Pattern Matching over Compressed Graphs via Dedensification , 2016, KDD.

[15]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[16]  Timos K. Sellis,et al.  View selection for designing the global data warehouse , 2001, Data Knowl. Eng..

[17]  Wenfei Fan,et al.  Dependencies for Graphs , 2019, ACM J. Data Inf. Qual..

[18]  Wenfei Fan,et al.  Rewriting Regular XPath Queries on XML Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  DBpedia , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[20]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[21]  Yinghui Wu,et al.  Functional Dependencies for Graphs , 2016, SIGMOD Conference.

[22]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[23]  Xin Wang,et al.  Answering Pattern Queries Using Views , 2016, IEEE Transactions on Knowledge and Data Engineering.

[24]  Philip S. Yu,et al.  Graph OLAP: Towards Online Analytical Processing on Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[25]  Sungpack Hong,et al.  PGQL: a property graph query language , 2016, GRADES '16.

[26]  Yannis Papakonstantinou,et al.  Fast In-Memory SQL Analytics on Typed Graphs , 2016, Proc. VLDB Endow..

[27]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[28]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[29]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[30]  Carlo Curino,et al.  Hydra: a federated resource manager for data-center scale analytics , 2019, NSDI.

[31]  Wolfgang Lehner,et al.  SynopSys: large graph analytics in the SAP HANA database through summarization , 2013, GRADES.

[32]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[33]  Carlo Curino,et al.  Dependency-Driven Analytics: A Compass for Uncharted Data Oceans , 2017, CIDR.

[34]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[35]  François Goasdoué,et al.  View Selection in Semantic Web Databases , 2011, Proc. VLDB Endow..

[36]  Ping Lu,et al.  Dependencies for Graphs , 2017, PODS.

[37]  Amol Deshpande,et al.  Extracting and Analyzing Hidden Graphs from Relational Databases , 2017, SIGMOD Conference.

[38]  Feifei Li,et al.  Rewriting queries on SPARQL views , 2011, WWW.

[39]  Béla Bollobás,et al.  Random Graphs , 1985 .

[40]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[41]  V. S. Subrahmanian,et al.  RDF aggregate queries and views , 2005, 21st International Conference on Data Engineering (ICDE'05).

[42]  Qing Chen,et al.  Graph Stream Summarization: From Big Bang to Big Crunch , 2016, SIGMOD Conference.

[43]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[44]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[45]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[46]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[47]  Ping Lu,et al.  Edinburgh Research Explorer Discovering Graph Functional Dependencies , 2022 .