A Stitch in Time Saves Nine - SPARQL querying of Property Graphs using Gremlin Traversals

Knowledge graphs have become popular over the past years and frequently rely on the Resource Description Framework (RDF) or Property Graphs (PG) as underlying data models. However, the query languages for these two data models -- SPARQL for RDF and Gremlin for property graph traversal -- are lacking interoperability. We present Gremlinator, a novel SPARQL to Gremlin translator. Gremlinator translates SPARQL queries to Gremlin traversals for executing graph pattern matching queries over graph databases. This allows to access and query a wide variety of Graph Data Management Systems (DMS) using the W3C standardized SPARQL query language and avoid the learning curve of a new Graph Query Language. Gremlin is a system-agnostic traversal language covering both OLTP graph database or OLAP graph processors, thus making it a desirable choice for supporting interoperability wrt. querying Graph DMSs. We present a comprehensive empirical evaluation of Gremlinator and demonstrate its validity and applicability by executing SPARQL queries on top of the leading graph stores Neo4J, Sparksee, and Apache TinkerGraph and compare the performance with the RDF stores Virtuoso, 4Store and JenaTDB. Our evaluation demonstrates the substantial performance gain obtained by the Gremlin counterparts of the SPARQL queries, especially for star-shaped and complex queries.

[1]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[2]  Amit P. Sheth,et al.  A Formal Graph Model for RDF and Its Implementation , 2016, ArXiv.

[3]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Mariano Rodriguez-Muro,et al.  Efficient SPARQL-to-SQL with R2RML mappings , 2015, J. Web Semant..

[6]  Jens Lehmann,et al.  Killing Two Birds with One Stone - Querying Property Graphs using SPARQL via GREMLINATOR , 2018, ArXiv.

[7]  Jayanta Banerjee,et al.  A Tale of Two Graphs: Property Graphs as RDF in Oracle , 2014, EDBT.

[8]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[9]  Olaf Hartig,et al.  Reconciliation of RDF* and Property Graphs , 2014, ArXiv.

[10]  Andrey Gubichev,et al.  Query Processing and Optimization in Graph Databases , 2015 .

[11]  Muhammad Saleem,et al.  A fine-grained evaluation of SPARQL endpoint federation systems , 2016, Semantic Web.

[12]  Claudio Gutiérrez,et al.  The Multiset Semantics of SPARQL Patterns , 2016, SEMWEB.

[13]  Axel-Cyrille Ngonga Ngomo,et al.  HOBBIT: Holistic Benchmarking of Big Linked Data , 2016, ERCIM News.

[14]  Jan Van den Bussche,et al.  On the Power of SPARQL in Expressing Navigational Queries , 2015, Comput. J..

[15]  Marko A. Rodriguez,et al.  The Graph Traversal Pattern , 2010, Graph Data Management.

[16]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[17]  Guillermo Palma,et al.  GRAPHIUM: Visualizing Performance of Graph and RDF Engines on Linked Data , 2013, International Semantic Web Conference.

[18]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[19]  Norbert Martínez-Bazan,et al.  DEX: A high-performance graph database management system , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[20]  Renzo Angles,et al.  A Comparison of Current Graph Database Models , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[21]  Donovan S. Conley Virtuoso , 2008 .

[22]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[23]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[24]  Shiyong Lu,et al.  Semantics preserving SPARQL-to-SQL translation , 2009, Data Knowl. Eng..

[25]  Andrey Gubichev,et al.  Graph Pattern Matching: Do We Have to Reinvent the Wheel? , 2014, GRADES.

[26]  Kai-Uwe Sattler,et al.  An SQL-Based Query Language and Engine for Graph Pattern Matching , 2016, ICGT.

[27]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[28]  Diego Calvanese,et al.  Ontop: Answering SPARQL queries over relational databases , 2016, Semantic Web.

[29]  Jens Lehmann,et al.  Trying Not to Die Benchmarking: Orchestrating RDF and Graph Data Management Solution Benchmarks Using LITMUS , 2017, SEMANTiCS.

[30]  Freddy Priyatna,et al.  Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph , 2014, WWW.

[31]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[32]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[33]  Z. Meral Özsoyoglu,et al.  A complete translation from SPARQL into efficient SQL , 2009, IDEAS '09.

[34]  Marko A. Rodriguez,et al.  Quantum Walks with Gremlin , 2015, ArXiv.

[35]  Michael Grossniklaus,et al.  An Algebra and Equivalences to Transform Graph Patterns in Neo4j , 2016, EDBT/ICDT Workshops.

[36]  Marko A. Rodriguez,et al.  A path algebra for multi-relational graphs , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[37]  Juan L. Reutter Graph patterns : structure, query answering and applications in schema mappings and formal language theory , 2014 .

[38]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[39]  Murat Kantarcioglu,et al.  RETRO: A framework for semantics preserving SQL-to-SPARQL translation , 2011, ISWC 2011.

[40]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[41]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[42]  N. Shadbolt,et al.  4store: The Design and Implementation of a Clustered RDF Store , 2009 .

[43]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[44]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[45]  Josep-Lluís Larriba-Pey,et al.  Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark , 2010, WAIM Workshops.

[46]  Marko A. Rodriguez,et al.  The Gremlin graph traversal machine and language (invited talk) , 2015, DBPL.

[47]  Toyotaro Suzumura,et al.  XGDBench: A benchmarking platform for graph stores in exascale clouds , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[49]  Maria-Esther Vidal,et al.  Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin , 2017, DEXA.

[50]  Dániel Varró,et al.  Formalising opencypher Graph Queries in Relational Algebra , 2017, ADBIS.

[51]  Harsh Thakkar Towards an Open Extensible Framework for Empirical Benchmarking of Data Management Solutions: LITMUS , 2017, ESWC.

[52]  Marcelo Arenas,et al.  Foundations of Modern Graph Query Languages , 2016, ArXiv.

[53]  Josep-Lluís Larriba-Pey,et al.  The linked data benchmark council: a graph and RDF industry benchmarking effort , 2014, SGMD.