Fast Processing SPARQL Queries on Large RDF Data

The RDF (Resource Description Framework) datamodel has been used in various domains, such as Web,government, biology etc. Now, the volume of RDF datasets is growing significantly. The explosion on the volume of RDF data raises serious challenges: how to answer SPARQL queries on large RDF data sets efficiently. Here, we present a large-scale RDF data system - TripleParallel, which implements blockbased parallel processing SPARQL queries on RDF data sets with billion triples. The system improves parallelism while strengthening the overlapping data and calculations and reduces the overall execution time of the query. TripleParallel also implements multiple parallel operations for parallel processing joins. Experimental studies with several RDF datasets, including the LUBM and the UniProt collection, demonstrate the performance gains of our approach, outperforming the previous fastest system by more than an order of magnitude.

[1]  Maria-Esther Vidal,et al.  Efficiently Joining Group Patterns in SPARQL Queries , 2010, ESWC.

[2]  Krys J. Kochut,et al.  BRAHMS: A WorkBench RDF Store and High Performance Memory System for Semantic Association Discovery , 2005, SEMWEB.

[3]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[4]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[5]  Sriram Krishnamoorthy,et al.  Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[6]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[7]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[8]  Donald Kossmann,et al.  Iterative dynamic programming: a new class of query optimization algorithms , 2000, TODS.

[9]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[10]  Xiaoning Ding,et al.  BWS: balanced work stealing for time-sharing multicores , 2012, EuroSys '12.

[11]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[12]  Shiyong Lu,et al.  Semantics preserving SPARQL-to-SQL translation , 2009, Data Knowl. Eng..

[13]  Guang Yang,et al.  Dynamic and fast processing of queries on large-scale RDF data , 2014, Knowledge and Information Systems.

[14]  Eva Zangerle,et al.  SpiderStore: Exploiting Main Memory for Efficient RDF Graph Representation and Fast Querying , 2010 .

[15]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[16]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[17]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[18]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[19]  Gerhard Weikum,et al.  x-RDF-3X , 2010, Proc. VLDB Endow..

[20]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.