DRSS: Distributed RDF SPARQL Streaming

In this work, we present DRSS, a distributed and scalable engine for RDF streams processing. DRSS proposes a new query syntax for continuous querying of RDF data streams. The system includes among others three efficient algorithms for (1) rewriting continuous queries sharing common sub-structures (2), SPARQL query partitioning across multiple computer nodes according to an efficient distribution strategy and (3) query-based data distribution for local processing of sub-queries minimizing data exchanged across nodes. Our system combines both real-time data from multiple sources and stored RDF processing. DRSS and its all algorithms are implemented using the real-time data processing platform Storm Framework, which provides parallelization mechanisms of query operators. The DRSS evaluation is conducted on a real dataset containing up to 1 million RDF graphs. Experiments and obtained results confirm the scalability and the effectiveness of our system.

[1]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[2]  Thomas Eiter,et al.  Linked Stream Data Processing Engines: Facts and Figures , 2012, SEMWEB.

[3]  Alasdair J. G. Gray,et al.  Enabling Ontology-Based Access to Streaming Data Sources , 2010, SEMWEB.

[4]  Katja Hose,et al.  FedX: A Federation Layer for Distributed Query Processing on Linked Open Data , 2011, ESWC.

[5]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[6]  Daniele Braga,et al.  C-SPARQL: a Continuous Query Language for RDF Data Streams , 2010, Int. J. Semantic Comput..

[7]  Danh Le Phuoc,et al.  A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data , 2011, SEMWEB.

[8]  Opher Etzion,et al.  Event Processing in Action , 2010 .

[9]  Óscar Corcho,et al.  Federating queries in SPARQL 1.1: Syntax, semantics and evaluation , 2013, J. Web Semant..

[10]  Sherif Sakr,et al.  DREAM: Distributed RDF Engine with Adaptive Query Planner and Minimal Communication , 2015, Proc. VLDB Endow..

[11]  Gauthier Picard,et al.  DIONYSUS: Towards Query-aware Distributed Processing of RDF Graph Streams , 2016, EDBT/ICDT Workshops.

[12]  Hoan Quoc Nguyen-Mau,et al.  Elastic and Scalable Processing of Linked Stream Data in the Cloud , 2013, SEMWEB.

[13]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[14]  Andre Bolles,et al.  Streaming SPARQL - Extending SPARQL to Process Data Streams , 2008, ESWC.

[15]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[16]  Daniele Braga,et al.  An execution environment for C-SPARQL queries , 2010, EDBT '10.

[17]  Nektarios Gioldasis,et al.  SPARQL-RW: transparent query access over mapped RDF data sources , 2012, EDBT '12.

[18]  Dieter Fensel,et al.  Sparkwave: continuous schema-enhanced pattern matching over RDF data streams , 2012, DEBS.

[19]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[20]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[21]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[22]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..