ScalaRDF: A Distributed, Elastic and Scalable In-Memory RDF Triple Store

The Resource Description Framework (RDF) andSPARQL query language are gaining increasing popularity andacceptance. The ever-increasing RDF data has reached a billionscale of triples, resulting in the proliferation of distributed RDFstore systems within the Semantic Web community. However, theelasticity and performance issues are still far from settled inface of data volume explosion and workload spike. In addition, providers face great pressures to provision uninterrupted reliablestorage service whilst reducing the operational costs due to avariety of system failures. Therefore, how to efficiently realizesystem fault tolerance remains an intractable problem. In this paper, we introduce ScalaRDF, a distributed and elastic in-memoryRDF triple store to provision a fault-tolerant and scalable RDFstore and query mechanism. Specifically, we describe a consistenthashing protocol that optimizes the RDF data placement, dataoperations (especially for online RDF triple update operations)and achieves an autonomously elastic data re-distribution in theevent of cluster node joining or departing, avoiding the holisticoscillation of data storage. In addition, the data store is ableto realize a rapid and transparent failover through replicationmechanism which stores in-memory data replica in the next hashhop. The experiments demonstrate that query time and updatetime are reduced by 87% and 90% respectively compared to otherapproaches. For an 18G source dataset, the data redistributiontakes at most 60 seconds when system scales out and at most 100seconds for recovery when nodes undergo crash-stop failures.

[1]  Jie Xu,et al.  Reliable Computing Service in Massive-Scale Systems through Rapid Low-Cost Failover , 2017, IEEE Transactions on Services Computing.

[2]  W. Walker,et al.  Mpi: a Standard Message Passing Interface 1 Mpi: a Standard Message Passing Interface , 1996 .

[3]  Chao Li,et al.  Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale , 2014, Proc. VLDB Endow..

[4]  Min Wang,et al.  EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[6]  Sherif Sakr,et al.  DREAM: Distributed RDF Engine with Adaptive Query Planner and Minimal Communication , 2015, Proc. VLDB Endow..

[7]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[8]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[9]  Jie Xu,et al.  Computing at Massive Scale: Scalability and Dependability Challenges , 2016, 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE).

[10]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[11]  James A. Hendler,et al.  BitMat: A Main-memory Bit Matrix of RDF Triples for Conjunctive Triple Pattern Queries , 2008, SEMWEB.

[12]  Adina Crainiceanu,et al.  Rya: a scalable RDF triple store for the clouds , 2012, Cloud-I '12.

[13]  Richard E. Schantz,et al.  Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store , 2011, DIDC '11.

[14]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[15]  Rong Gu,et al.  Rainbow: A distributed and hierarchical RDF triple store with dynamic scalability , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[16]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[17]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[18]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[19]  Huajun Chen,et al.  SparkRDF: Elastic Discreted RDF Graph Processing Engine With Distributed Memory , 2014, 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT).

[20]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[21]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[22]  Julian Dolby,et al.  Building an efficient RDF store over a relational database , 2013, SIGMOD '13.

[23]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[24]  David R. Karger,et al.  Web Caching with Consistent Hashing , 1999, Comput. Networks.

[25]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[26]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[27]  Jie Xu,et al.  D^2PS: A Dependable Data Provisioning Service in Multi-tenant Cloud Environment , 2016, 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE).

[28]  Richard E. Schantz,et al.  High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store , 2010, PSI EtA '10.