Modelling Provenance Collection Points and Their Impact on Provenance Graphs

As many domains employ ever more complex systems-of-systems, capturing provenance among component systems is increasingly important. Applications such as intrusion detection, load balancing, traffic routing, and insider threat detection all involve monitoring and analyzing the data provenance. Implicit in these applications is the assumption that "good" provenance is captured e.g. complete provenance graphs, or one full path. When attempting to provide "good" provenance for a complex system of systems, it is necessary to know "how hard" the provenance-enabling will be and the likely quality of the provenance to be produced. In this work, we provide analytical results and simulation tools to assist in the scoping of the provenance enabling process. We provide use cases of complex systems-of-systems within which users wish to capture provenance. We describe the parameters that must be taken into account when undertaking the provenance-enabling of a system of systems. We provide a tool that models the interactions and types of capture agents involved in a complex systems-of-systems, including the set of known and unknown systems in the environment. The tool provides an estimation of quantity and type of capture agents that will need to be deployed for provenance-enablement in a complex system that is not completely known.

[1]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[2]  Paul T. Groth,et al.  Automatic Metadata Annotation through Reconstructing Provenance , 2012, SWPM@ESWC.

[3]  Paolo Missier,et al.  Extracting PROV provenance traces from Wikipedia history pages , 2013, EDBT '13.

[4]  Matthew O. Jackson,et al.  The Evolution of Social and Economic Networks , 2002, J. Econ. Theory.

[5]  Moreno Marzolla,et al.  Netlogo , 2019, Economics for a Fairer Society.

[6]  Elisa Bertino,et al.  Query Processing Techniques for Compliance with Data Confidence Policies , 2009, Secure Data Management.

[7]  Rahul Ramachandran,et al.  Introducing Provenance Capture into a Legacy Data System , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Camélia Constantin,et al.  WebLab PROV: computing fine-grained provenance links for XML artifacts , 2013, EDBT '13.

[9]  Carole Goble,et al.  Discovering Scientific Workflows: The myExperiment Benchmarks , 2008 .

[10]  Adriane Chapman,et al.  Provenance Tipping Point , 2015, TaPP.

[11]  Sara Magliacane,et al.  Reconstructing Provenance , 2012, SEMWEB.

[12]  Yaxing Wei,et al.  YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts , 2015, ArXiv.

[13]  Camélia Constantin,et al.  WePIGE: The WebLab Provenance Information Generator and Explorer , 2014, EDBT.

[14]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[15]  Steven C Bankes,et al.  Tools and techniques for developing policies for complex and uncertain systems , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[17]  Rik Van de Walle,et al.  Git2PROV: Exposing Version Control System Content as W3C PROV , 2013, International Semantic Web Conference.

[18]  M. Jackson The Stability and Efficiency of Economic and Social Networks , 2003 .

[19]  Dhananjay K. Gode,et al.  Allocative Efficiency of Markets with Zero-Intelligence Traders: Market as a Partial Substitute for Individual Rationality , 1993, Journal of Political Economy.

[20]  Pietro Terna,et al.  How to build and use agent-based models in social science , 2000 .

[21]  Adriane Chapman,et al.  Provenance Capture Disparities Highlighted through Datasets , 2014, TAPP.

[22]  Óscar Corcho,et al.  A workflow PROV-corpus based on taverna and wings , 2013, EDBT '13.

[23]  Barbara Lerner,et al.  RDataTracker: Collecting Provenance in an Interactive Scripting Environment , 2014, TAPP.

[24]  Cláudio T. Silva,et al.  Querying and re-using workflows with VsTrails , 2008, SIGMOD Conference.

[25]  Adriane Chapman,et al.  Capturing Provenance in the Wild , 2010, IPAW.

[26]  Hazeline U. Asuncion Automated data provenance capture in spreadsheets, with case studies , 2013, Future Gener. Comput. Syst..

[27]  Adriane Chapman,et al.  Provenance for collaboration: Detecting suspicious behaviors and assessing trust in information , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[28]  Paul T. Groth,et al.  Looking Inside the Black-Box: Capturing Data Provenance Using Dynamic Instrumentation , 2014, IPAW.

[29]  Leigh Tesfatsion,et al.  Agent-based computational economics: modeling economies as complex adaptive systems , 2003, Inf. Sci..

[30]  Jennifer Widom,et al.  RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows , 2011, Proc. VLDB Endow..