Constructing data supply chain based on layered PROV

The inability to effectively construct data supply chain in distributed environments is becoming one of the top concerns in big data area. Aiming at this problem, a novel method of constructing data supply chain based on layered PROV is proposed. First, to abstractly describe the data transfer processes from creation to distribution, a data provenance specification presented by W3C is used to standardize the information records of data activities within and across data platforms. Then, a distributed PROV data generation algorithm for multi-platform is designed. Further, we propose a tiered storage management of provenance based on summarization technology, which reduces the provenance records by compressing mid versions so as to realize multi-level management of PROV. In specific, we propose a hierarchical visual technique based on a layered query mechanism, which allows users to visualize data supply chain from general to detail. The experimental results show that the proposed approach can effectively improve the construction performance for data supply chain.

[1]  Darrell D. E. Long,et al.  Easing the burdens of HPC file management , 2011, PDSW '11.

[2]  Andreas Haeberlen,et al.  Secure network provenance , 2011, SOSP.

[3]  Suhong Li,et al.  Factors in the Adoption of Third-Party B2B Portals in the Textile Industry , 2016 .

[4]  Chen Zhang,et al.  Secure Information Sharing in Internet-Based Supply Chain Management Systems , 2006, J. Comput. Inf. Syst..

[5]  FengDan,et al.  Evaluation of a Hybrid Approach for Efficient Provenance Storage , 2013 .

[6]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[7]  Yolanda Gil,et al.  PROV-DM: The PROV Data Model , 2013 .

[8]  Rajeev Agrawal,et al.  A layer based architecture for provenance in big data , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[9]  Paul T. Groth Transparency and Reliability in the Data Supply Chain , 2013, IEEE Internet Computing.

[10]  Paul T. Groth,et al.  Looking Inside the Black-Box: Capturing Data Provenance Using Dynamic Instrumentation , 2014, IPAW.

[11]  Anupam Joshi,et al.  PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments , 2014, HPCA 2014.

[12]  Geoff Holmes,et al.  Security and Data Accountability in Distributed Systems: A Provenance Survey , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[13]  Chen Shou,et al.  Distributed data provenance for large-scale data-intensive computing , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[14]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[15]  Priya Mahadevan,et al.  Custodian-based information sharing , 2012, IEEE Communications Magazine.

[16]  Dan Feng,et al.  Design and Evaluation of a Provenance-Based Rebuild Framework , 2013, IEEE Transactions on Magnetics.

[17]  Dan Feng,et al.  Evaluation of a Hybrid Approach for Efficient Provenance Storage , 2013, TOS.

[18]  Anne-Marie Kermarrec,et al.  ACM/IFIP/USENIX 12th International Middleware Conference , 2011 .

[19]  Marta Mattoso,et al.  Dynamic steering of HPC scientific workflows: A survey , 2015, Future Gener. Comput. Syst..

[20]  Ryan K. L. Ko,et al.  Progger: An Efficient, Tamper-Evident Kernel-Space Logger for Cloud Data Provenance Tracking , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[21]  Bu-Sung Lee,et al.  S2Logger: End-to-End Data Tracking Mechanism for Cloud Data Provenance , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.