Hybrid query execution engine for large attributed graphs

Graphs are widely used for modeling complicated data such as social networks, bibliographical networks and knowledge bases. The growing sizes of graph databases motivate the crucial need for developing powerful and scalable graph-based query engines. We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language enables the expression of different types of graph queries that are of large interest in the databases that are modeled as large graph such as pattern matching, reachability and shortest path queries. Each query can combine both structural predicates and value-based predicates (on the attributes of the graph nodes/edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe an efficient hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph are stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database (using SQL) while the execution of other parts of the query plan is processed using memory-based algorithms, as necessary. Experimental results on real and synthetic datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.

[1]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[2]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Wei Wang,et al.  Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[4]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[5]  Fang Wei TEDI: efficient shortest path query answering on graphs , 2010, SIGMOD 2010.

[6]  Jeffrey Xu Yu,et al.  On-line exact shortest distance query processing , 2009, EDBT '09.

[7]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[8]  Guy M. Lohman,et al.  Query Optimization in the IBM DB2 Family. , 1993 .

[9]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[10]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[11]  Sherif Sakr,et al.  A framework for querying graph-based business process models , 2010, WWW '10.

[12]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[13]  Rada Chirkova,et al.  Efficiently Querying Large XML Data Repositories: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Sherif Sakr,et al.  Relational processing of RDF queries: a survey , 2010, SGMD.

[15]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[16]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[17]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[18]  Sherif Sakr,et al.  A SQL: 1999 code generator for the pathfinder xquery compiler , 2007, SIGMOD '07.

[19]  Z. Meral Özsoyoglu,et al.  A complete translation from SPARQL into efficient SQL , 2009, IDEAS '09.

[20]  Jianzhong Li,et al.  A novel approach for efficient supergraph query processing on graph databases , 2009, EDBT '09.

[21]  Shiyong Lu,et al.  Semantics preserving SPARQL-to-SQL translation , 2009, Data Knowl. Eng..

[22]  Wolfgang Lehner,et al.  SAP HANA distributed in-memory database system: Transaction, session, and metadata management , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[23]  Peng Peng,et al.  Subgraph Search over Massive Disk Resident Graphs , 2011, SSDBM.

[24]  Sherif Sakr,et al.  G-SPARQL: a hybrid engine for querying large attributed graphs , 2012, CIKM.

[25]  Lei Zou,et al.  Answering pattern match queries in large graph databases via graph embedding , 2011, The VLDB Journal.

[26]  Hamid Pirahesh,et al.  A rule engine for query transformation in Starburst and IBM DB2 C/S DBMS , 1997, Proceedings 13th International Conference on Data Engineering.

[27]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[28]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[29]  Ulf Leser,et al.  A query language for biological networks , 2005, ECCB/JBI.

[30]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[31]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[32]  V. S. Subrahmanian,et al.  DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases , 2009, SEMWEB.

[33]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[34]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[35]  SakrSherif,et al.  Relational processing of RDF queries , 2010 .

[36]  Wolfgang Lehner,et al.  SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform , 2013, Proc. VLDB Endow..

[37]  Sameh Elnikety,et al.  Horton: Online Query Execution Engine for Large Distributed Graphs , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[38]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[39]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[40]  Marc Gyssens,et al.  A graph-oriented object database model , 1990, IEEE Trans. Knowl. Data Eng..

[41]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[42]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[43]  Wolfgang Lehner,et al.  The Graph Story of the SAP HANA Database , 2013, BTW.

[44]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[45]  Sherif Sakr,et al.  XQuery on SQL Hosts , 2004, VLDB.

[46]  Jim Austin,et al.  Chemical similarity searching using a neural graph matcher , 2005, ESANN.

[47]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[48]  Gultekin Özsoyoglu,et al.  A graph query language and its query processing , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[49]  Amit P. Sheth,et al.  SPARQ2L: towards support for subgraph extraction queries in rdf databases , 2007, WWW '07.

[50]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[51]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[52]  Ralf Hartmut Güting,et al.  GraphDB: Modeling and Querying Graphs in Databases , 1994, VLDB.

[53]  Jean-François Baget,et al.  Extending SPARQL with regular expression patterns (for querying RDF) , 2009, J. Web Semant..

[54]  Shelley Powers,et al.  Practical RDF , 2003 .

[55]  Jiawei Han,et al.  Community Mining from Multi-relational Networks , 2005, PKDD.

[56]  ManegoldStefan,et al.  Column-store support for RDF data management , 2008, VLDB 2008.

[57]  Sherif Sakr,et al.  GraphREL: A Decomposition-Based and Selectivity-Aware Relational Framework for Processing Sub-graph Queries , 2009, DASFAA.

[58]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[59]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[60]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[61]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[62]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[63]  Torsten Grust,et al.  FERRY: database-supported program execution , 2009, SIGMOD Conference.