SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data

Considering the scalability and semantic requirements, Resource Description Framework (RDF) and the de-facto query language SPARQL are well suited for managing and querying online social network (OSN) data. Despite some existing works have introduced distributed framework for querying large-scale data, how to improve online query performance is still a challenging task. To address this problem, this paper proposes a scalable RDF data framework, which uses key-value store for offline RDF storage and pipelined in-memory based query strategy. The proposed framework efficiently supports SPARQL Basic Graph Pattern (BGP) queries on large-scale datasets. Experiments on the benchmark dataset demonstrate that the online SPARQL query performance of our framework outperforms existing distributed RDF solutions.

[1]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[2]  Bhavani M. Thuraisingham,et al.  Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing , 2011, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ioannis Konstantinou,et al.  H2RDF: adaptive query processing on RDF data in the cloud. , 2012, WWW.

[4]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[5]  James A. Hendler,et al.  BitMat: A Main-memory Bit Matrix of RDF Triples for Conjunctive Triple Pattern Queries , 2008, SEMWEB.

[6]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[7]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[8]  Sang-goo Lee,et al.  SPARQL basic graph pattern processing with iterative MapReduce , 2010, MDAC '10.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[13]  Orri Erling,et al.  Virtuoso: RDF Support in a Native RDBMS , 2009, Semantic Web Information Management.