An Algebra and Equivalences to Transform Graph Patterns in Neo4j

Modern query optimizers of relational database systems embody more than three decades of research and practice in the area of data management and processing. Key advances include algebraic query transformation, intelligent search space pruning, and modular optimizer architectures. Surprisingly, many of these contributions seem to have been overlooked in the emerging field of graph databases so far. In particular, we believe that query optimization based on a general graph algebra and its equivalences can greatly improve on the current state of the art. Although some graph algebras have already been proposed, they have often been developed in a context, in which a relational database system is used as a backend to process graph data. As a consequence, these algebras are typically tightly coupled to the relational algebra, making them unsuitable for native graph databases. While we support the approach of extending the relational algebra, we argue that graph-specific operations should be defined at a higher level, independent of the database backend. In this paper, we introduce such a general graph algebra and corresponding equivalences. We demonstrate how it can be used to optimize Cypher queries in the setting of the Neo4j native graph database.

[1]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[2]  Meikel Pöss,et al.  Of Snowstorms and Bushy Trees , 2014, Proc. VLDB Endow..

[3]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[4]  Sherif Sakr,et al.  G-SPARQL: a hybrid engine for querying large attributed graphs , 2012, CIKM.

[5]  Goetz Graefe The Cascades Framework for Query Optimization , 1995, IEEE Data Eng. Bull..

[6]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[7]  Claudio Gutiérrez,et al.  SNQL: A Social Networks Query and Transformation Language , 2011, AMW.

[8]  Nigel Shadbolt,et al.  SPARQL Query Processing with Conventional Relational Database Systems , 2005, WISE Workshops.

[9]  Wenfei Fan,et al.  Graph pattern matching revised for social network analysis , 2012, ICDT '12.

[10]  RalfHiutmut Gtiting,et al.  GraphDB : Modeling and Querying Graphs in Databases , 1998 .

[11]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[12]  Shiyong Lu,et al.  Semantics preserving SPARQL-to-SQL translation , 2009, Data Knowl. Eng..

[13]  Luc De Raedt,et al.  A query language for analyzing networks , 2009, CIKM.

[14]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[15]  Cong Yu,et al.  SocialScope: Enabling Information Discovery on Social Content Sites , 2009, CIDR.

[16]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[17]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[18]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[19]  Ulf Leser,et al.  A query language for biological networks , 2005, ECCB/JBI.

[20]  Yannis E. Ioannidis,et al.  Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization , 1991, SIGMOD '91.

[21]  Jarek Gryz,et al.  WAVEGUIDE: Evaluating SPARQL Property Path Queries , 2015, EDBT.