C-SPARQL Extension for Sampling RDF Graphs Streams

Our daily use of Internet and related technologies generates continuously large amount of heterogeneous data flows. Several RDF Stream Processing (RSP) systems have been proposed. Existing RSP systems benefit from the advantages of semantic web technologies and traditional data flow management systems. C-SPARQL, CQELS, SPARQL\(_{stream}\), EP-SPARQL, and Sparkwave extend the semantic query language SPARQL and are examples of those systems. Considering that the storage and processing of all these streams become expensive, we propose a solution to reduce the load while keeping data semantics, and optimizing treatments. In this paper, we propose to extend C-SPARQL for continuously generating samples on RDF graphs. We add three sampling operators (UNIFORM, RESERVOIR and CHAIN) to the C-SPARQL query syntax. These operators have been implemented into Esper, the C-SPARQL’s data flow management module. The experiments show the performance of our extension in terms of execution time and preserving data semantics.

[1]  Charles L. Forgy,et al.  Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem , 1982, Artif. Intell..

[2]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[3]  Qian Zhu,et al.  Dynamic Resource Provisioning for Data Streaming Applications in a Cloud Environment , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[4]  Daniele Braga,et al.  An execution environment for C-SPARQL queries , 2010, EDBT '10.

[5]  Dieter Fensel,et al.  Sparkwave: continuous schema-enhanced pattern matching over RDF data streams , 2012, DEBS.

[6]  Danh Le Phuoc,et al.  A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data , 2011, SEMWEB.

[7]  Wen Zhang,et al.  Dynamic Control of Data Streaming and Processing in a Virtualized Environment , 2012, IEEE Transactions on Automation Science and Engineering.

[8]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[9]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[10]  Sebastian Rudolph,et al.  ETALIS: Rule-Based Reasoning in Event Processing , 2011 .

[11]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[12]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[13]  Andre Bolles,et al.  Streaming SPARQL - Extending SPARQL to Process Data Streams , 2008, ESWC.

[14]  Jennifer Widom,et al.  CQL: A Language for Continuous Queries over Streams and Relations , 2003, DBPL.

[15]  Alasdair J. G. Gray,et al.  Enabling Ontology-Based Access to Streaming Data Sources , 2010, SEMWEB.