Provenance compression scheme based on graph patterns for large RDF documents

Provenance data are metadata that represent the source information or modification history of various data. Provenance information can be a few dozen times greater in amount than the original data because it is continuously increased whenever the source data are modified. Therefore, schemes for efficiently compressing large-capacity provenance data are required. In this paper, we proposed a new resource description framework (RDF) provenance compression scheme that considers graph patterns. The proposed scheme reduces the space occupied by string data by converting the provenance data into numeric data through a dictionary encoding process. Unlike existing provenance compression schemes, in the proposed scheme, some RDF documents manage the source RDF documents on the semantic web to track changes in the provenance data. The proposed scheme reduces the storage space by compressing the source RDF documents by considering their patterns. It also compresses the provenance data by considering the patterns of active nodes in the PROV model. This improves the compression performance through a compression based on the provenance flow. The excellence of the proposed scheme was verified based on the compression rate and processing time determined from a performance evaluation.

[1]  Dan Feng,et al.  Evaluation of a Hybrid Approach for Efficient Provenance Storage , 2013, TOS.

[2]  Qian Liu,et al.  Provenance Management over Linked Data Streams , 2019, Open J. Databases.

[3]  James Cheney,et al.  Dynamic Provenance for SPARQL Updates , 2014, International Semantic Web Conference.

[4]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[5]  Lei Zou,et al.  Graph-Based RDF Data Management , 2017, Data Science and Engineering.

[6]  Xuanxing Yang Query for Streaming Information: Dynamic Processing and Adaptive Incremental Maintenance of RDF Stream , 2018, WWW.

[7]  Ryan Wright Quine: A Temporal Graph System for Provenance Storage and Analysis , 2018, IPAW.

[8]  James Cheney,et al.  The W3C PROV family of specifications for modelling provenance metadata , 2013, EDBT '13.

[9]  David Koop,et al.  Enhancing Web-based Analytics Applications through Provenance , 2019, IEEE Transactions on Visualization and Computer Graphics.

[10]  Young-Koo Lee,et al.  StarZIP: Streaming Graph Compression Technique for Data Archiving , 2019, IEEE Access.

[11]  John Sartori,et al.  Approximate Communication , 2018, ACM Comput. Surv..

[12]  Aladdin Enterprises,et al.  ZLIB Compressed Data Format Specification version 3.3 , 1996 .

[13]  Yolanda Gil,et al.  PROV-DM: The PROV Data Model , 2013 .

[14]  M. Tamer Özsu A survey of RDF data management systems , 2016, Frontiers of Computer Science.

[15]  Fernanda Campos,et al.  Provenance data discovery through Semantic Web resources , 2018, Concurr. Comput. Pract. Exp..

[16]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[17]  Axel Polleres,et al.  HDTQ: Managing RDF Datasets in Compressed Space , 2018, ESWC.

[18]  Óscar Corcho,et al.  RDSZ: An Approach for Lossless RDF Stream Compression , 2014, ESWC.

[19]  Paul T. Groth,et al.  Provenance: An Introduction to PROV , 2013, Provenance.

[20]  José Manuél Gómez-Pérez,et al.  Indexing Execution Patterns in Workflow Provenance Graphs through Generalized Trie Structures , 2018, ArXiv.

[21]  Shiyong Lu,et al.  RDFProv: A relational RDF store for querying and managing scientific workflow provenance , 2010, Data Knowl. Eng..

[22]  Nicholas Gibbins,et al.  Using Provenance to Efficiently Propagate SPARQL Updates on RDF Source Graphs , 2018, IPAW.

[23]  Tao Zhu,et al.  A survey of RDF management technologies and benchmark datasets , 2018, Journal of Ambient Intelligence and Humanized Computing.

[24]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[25]  Maria-Esther Vidal,et al.  Evaluation of metadata representations in RDF stores , 2019, Semantic Web.

[26]  Xin Wang,et al.  Distributed Efficient Provenance-Aware Regular Path Queries on Large RDF Graphs , 2018, DASFAA.

[27]  Fakhri Alam Khan,et al.  Towards Next Generation Provenance Systems for e-Science , 2011, Int. J. Inf. Syst. Model. Des..

[28]  Amol Deshpande,et al.  ProvDB: Provenance-enabled Lifecycle Management of Collaborative Data Analysis Workflows , 2018, IEEE Data Eng. Bull..

[29]  Fernanda Campos,et al.  Scientific provenance metadata capture and management using Semantic Web , 2015, Int. J. Metadata Semant. Ontologies.

[30]  Lucie-Aimée Kaffee,et al.  Provenance Information in a Collaborative Knowledge Graph: An Evaluation of Wikidata External References , 2017, SEMWEB.

[31]  Sherif Sakr,et al.  RDF Data Storage and Query Processing Schemes , 2018, ACM Comput. Surv..

[32]  James Cheney,et al.  Dynamic provenance for SPARQL updates using named graphs , 2014, WWW '14 Companion.

[33]  Nieves R. Brisaboa,et al.  Compressed k2-Triples for Full-In-Memory RDF Engines , 2011, AMCIS.

[34]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[35]  Dimitris Plexousakis,et al.  Provenance Management for Evolving RDF Datasets , 2016, ESWC.

[36]  Sebastian Maneth,et al.  Grammar-Based Graph Compression , 2017, Inf. Syst..

[37]  Utpal Biswas,et al.  Efficient Provenance Storage for RDF Dataset in Semantic Web Environment , 2015, 2015 International Conference on Information Technology (ICIT).

[38]  Paul T. Groth,et al.  Storing, Tracking, and Querying Provenance in Linked Data , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  Sal March,et al.  A provenance-based approach to semantic web service description and discovery , 2014, Decis. Support Syst..

[40]  Joan Masó-Pau,et al.  W3C PROV to describe provenance at the dataset, feature and attribute levels in a distributed environment , 2017, Comput. Environ. Urban Syst..

[41]  Sal March,et al.  Corrigendum to "A provenance-based approach to semantic web service description and discovery" [Decis. Support. Syst. (64C) (2014) 90-99] , 2016, Decis. Support Syst..

[42]  Gonzalo Navarro,et al.  k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[43]  Maria Teresa Pazienza,et al.  Change management and validation for collaborative editing of RDF datasets , 2017, Int. J. Metadata Semant. Ontologies.

[44]  Yulai Xie,et al.  A hybrid approach for efficient provenance storage , 2012, CIKM '12.

[45]  Tariq Mahmood,et al.  Toward the modeling of data provenance in scientific publications , 2013, Comput. Stand. Interfaces.