Patorc: Pattern Oriented Compression for Semantic Data Streams

Recently, semantic data streams were proposed as a solution to cope with the heterogeneity of the original streams. However, nowadays, huge volumes of data are produced on the web, at very high velocity. This may provoke bottleneck effect and decrease efficiency of RDF stream processing engines. One approach to address this issue is to compress the data in the stream to decrease the delays and costs of the RDF exchange on the network. In this paper, we propose Patorc: a PATern ORiented Compression approach, a lossless method for compressing semantic data stream. Our approach takes advantage of the RDF data streams key features, which are the regularity of their graph structure and the redundancy of part of data. Experiments carried on publicly available datasets have demonstrated the effectiveness of our approach.

[1]  Abraham Bernstein,et al.  The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources , 2014, ESWC.

[2]  Andre Bolles,et al.  Streaming SPARQL - Extending SPARQL to Process Data Streams , 2008, ESWC.

[3]  Song Liu,et al.  Load shedding in stream databases: a control-based approach , 2006, VLDB.

[4]  Jacopo Urbani,et al.  Scalable RDF data compression with MapReduce , 2013, Concurr. Comput. Pract. Exp..

[5]  Alasdair J. G. Gray,et al.  Enabling Ontology-Based Access to Streaming Data Sources , 2010, SEMWEB.

[6]  Óscar Corcho,et al.  Efficient RDF Interchange (ERI) Format for RDF Data Streams , 2014, SEMWEB.

[7]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[8]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[9]  Alfredo Cuzzocrea,et al.  Frequent Subgraph Mining from Streams of Linked Graph Structured Data , 2015, EDBT/ICDT Workshops.

[10]  Danh Le Phuoc,et al.  A Native and Adaptive Approach for Linked Stream Data Processing , 2013 .

[11]  Raja Chiky,et al.  FreGraPaD: Frequent RDF graph patterns detection for semantic data streams , 2016, 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS).

[12]  Frank van Harmelen,et al.  Streaming the Web: Reasoning over dynamic data , 2014, J. Web Semant..

[13]  Óscar Corcho,et al.  Transforming meteorological data into Linked Data , 2013, Semantic Web.

[14]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[15]  Amel Bouzeghoub,et al.  Graph-oriented load-shedding for semantic Data Stream processing , 2015, 2015 International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM).

[16]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[17]  Abraham Bernstein,et al.  Eviction Strategies for Semantic Flow Processing , 2013, SSWS@ISWC.

[18]  Dieter Fensel,et al.  Sparkwave: continuous schema-enhanced pattern matching over RDF data streams , 2012, DEBS.

[19]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[20]  R. Doyle The American terrorist. , 2001, Scientific American.

[21]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[22]  Daniele Braga,et al.  An execution environment for C-SPARQL queries , 2010, EDBT '10.

[23]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[24]  Óscar Corcho,et al.  RDSZ: An Approach for Lossless RDF Stream Compression , 2014, ESWC.

[25]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[26]  Alfredo Cuzzocrea,et al.  Effectively and Efficiently Mining Frequent Patterns from Dense Graph Streams on Disk , 2014, KES.