Graph-oriented load-shedding for semantic Data Stream processing

The continuous and progressive growth of the need for knowledge extraction from continuous data streams, in an exponential way, has favored the emergence of a new research axis from the semantic web community. In the few last years, many semantic data stream processing systems have been proposed by combining Data Stream Management Systems (DSMS) technologies and Semantic Web technologies (RDF1/SPARQL2) for annotation, publication and reasoning on these data streams. However, considering their infinite volume and unknown velocity, processing and storing their contents remain impossible, which leads to introduce techniques for reducing load and/or summarizing data. In this context, we propose a graph-oriented approach to reduce the semantic data streams volume. In order to validate our approach, we implemented it using Simple Random Sampling and Stratified Random Sampling and we experimented it using the CSRBench benchmark. Our approach allows to maintain the data consistency and their semantic level.

[1]  Alasdair J. G. Gray,et al.  Enabling Ontology-Based Access to Streaming Data Sources , 2010, SEMWEB.

[2]  Rajeev Motwani,et al.  Load Shedding in Data Stream Systems , 2007, Data Streams - Models and Algorithms.

[3]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[4]  Raja Chiky,et al.  Sampling Semantic Data Stream: Resolving Overload and Limited Storage Issues , 2013, DaEng.

[5]  Hoan Quoc Nguyen-Mau,et al.  Elastic and Scalable Processing of Linked Stream Data in the Cloud , 2013, SEMWEB.

[6]  Danh Le Phuoc,et al.  A Native and Adaptive Approach for Linked Stream Data Processing , 2013 .

[7]  Frank van Harmelen,et al.  Streaming the Web: Reasoning over dynamic data , 2014, J. Web Semant..

[8]  Óscar Corcho,et al.  On Correctness in RDF Stream Processor Benchmarking , 2013, International Semantic Web Conference.

[9]  Daniele Braga,et al.  An execution environment for C-SPARQL queries , 2010, EDBT '10.

[10]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[11]  Dieter Fensel,et al.  Sparkwave: continuous schema-enhanced pattern matching over RDF data streams , 2012, DEBS.

[12]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[13]  Ying Zhang,et al.  SRBench: A Streaming RDF/SPARQL Benchmark , 2012, SEMWEB.

[14]  Andre Bolles,et al.  Streaming SPARQL - Extending SPARQL to Process Data Streams , 2008, ESWC.

[15]  S. Kotoulas,et al.  High-performance Distributed Stream Reasoning using S4 , 2011 .