A Distributed Graph Engine for Web Scale RDF Data

Much work has been devoted to supporting RDF data. But state-of-the-art systems and methods still cannot handle web scale RDF data effectively. Furthermore, many useful and general purpose graph-based operations (e.g., random walk, reachability, community discovery) on RDF data are not supported, as most existing systems store and index data in particular ways (e.g., as relational tables or as a bitmap matrix) to maximize one particular operation on RDF data: SPARQL query processing. In this paper, we introduce Trinity. RDF, a distributed, memory-based graph engine for web scale RDF data. Instead of managing the RDF data in triple stores or as bitmap matrices, we store RDF data in its native graph form. It achieves much better (sometimes orders of magnitude better) performance for SPARQL queries than the state-of-the-art approaches. Furthermore, since the data is stored in its native graph form, the system can support other operations (e.g., random walks, reachability) on RDF graphs as well. We conduct comprehensive experimental studies on real life, web scale RDF data to demonstrate the effectiveness of our approach.

[1]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[2]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[3]  Philip S. Yu,et al.  Mining top-K large structural patterns in a massive network , 2011, Proc. VLDB Endow..

[4]  Kewei Tu,et al.  An Approach to RDF(S) Query, Manipulation and Inference on Databases , 2005, WAIM.

[5]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[6]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[7]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[8]  Claudio Gutiérrez,et al.  Bipartite Graphs as Intermediate Model for RDF , 2004, SEMWEB.

[9]  Haixun Wang,et al.  The Trinity Graph Engine , 2012 .

[10]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[12]  Claudio Gutiérrez,et al.  Querying RDF Data from a Graph Database Perspective , 2005, ESWC.

[13]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[14]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[15]  Haixun Wang,et al.  Managing and mining large graphs: systems and implementations , 2012, SIGMOD Conference.

[16]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[17]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[18]  Vassilis Christophides,et al.  The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , 2001, SemWeb.

[19]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Richard E. Schantz,et al.  High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store , 2010, PSI EtA '10.

[21]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[22]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[23]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Eugene Inseok Chong,et al.  An Efficient SQL-based RDF Querying Scheme , 2005, VLDB.

[25]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[26]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[27]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[28]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[29]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[30]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[31]  Bhavani M. Thuraisingham,et al.  Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing , 2011, IEEE Transactions on Knowledge and Data Engineering.

[32]  Annika Hinze,et al.  Storing RDF as a graph , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[33]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[34]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[35]  Orri Erling,et al.  Virtuoso: RDF Support in a Native RDBMS , 2009, Semantic Web Information Management.

[36]  Bo Zong,et al.  Towards effective partition management for large graphs , 2012, SIGMOD Conference.

[37]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.