Implementations of fine-grained automated data provenance to support transparent environmental modelling

Abstract Demand is increasing for greater transparency of the science underpinning decision-making processes in land resource management. To illustrate how the application of fine-grained data provenance can increase the credibility and transparency of scientific methods and outputs, we implement provenance tracking for two different modelling frameworks, pyluc and LUMASS, and present results from example models. Pyluc is a python-based framework for generating spatial land use classification data with automatically-generated technical documentation. LUMASS is a spatial modelling and optimisation framework within which New Zealand's sediment budget model SedNetNZ is implemented. In both cases, detailed provenance tracking resulted in a complexity of information which necessitated the development of an interactive data provenance visualization tool to help science producers and users explore, verify, and understand model outputs. We argue that best data management and sharing practice should include fine-grained data provenance to meet demands for the quality and integrity of science-based data and information.

[1]  A. Young Land Resources: Now And For The Future , 2014 .

[2]  N. Cooper,et al.  A Guide to Reproducible Code in Ecology and Evolution , 2017 .

[3]  Brian A. Nosek,et al.  Promoting Transparency in Social Science Research , 2014, Science.

[4]  Joan Masó-Pau,et al.  W3C PROV to describe provenance at the dataset, feature and attribute levels in a distributed environment , 2017, Comput. Environ. Urban Syst..

[5]  Esteban Walter Gonzalez Clua,et al.  Prov Viewer: A Graph-Based Visualization Tool for Interactive Exploration of Provenance Data , 2016, IPAW.

[6]  Jinguang Zheng,et al.  Ontology engineering in provenance enablement for the National Climate Assessment , 2014, Environ. Model. Softw..

[7]  Paul T. Groth,et al.  PROV-O-Viz - Understanding the Role of Activities in Provenance , 2014, IPAW.

[8]  Karina Gibert,et al.  Environmental Data Science , 2018, Environ. Model. Softw..

[9]  Peter B. Woodbury,et al.  Defining a best practice methodology for modeling the environmental performance of agriculture , 2018, Environmental Science & Policy.

[10]  Juliana Freire,et al.  Towards Integrating Workflow and Database Provenance , 2012, IPAW.

[11]  Jens Kattge,et al.  Carrots and sticks. , 2014, Newsweek.

[12]  Mohamed M. Morsy,et al.  Integrating scientific cyberinfrastructures to improve reproducibility in computational hydrology: Example for HydroShare and GeoTrust , 2018, Environ. Model. Softw..

[13]  Sudha Ram,et al.  Who does what: Collaboration patterns in the wikipedia and their impact on article quality , 2011, TMIS.

[14]  James C. Ascough,et al.  Modeling water and soil quality environmental impacts associated with bioenergy crop production and biomass removal in the Midwest USA. , 2011 .

[15]  D. L. Scarnecchia,et al.  Fundamentals of Ecological Modelling , 1995 .

[16]  Antonio S. Cofiño,et al.  The R-based climate4R open framework for reproducible climate data access and post-processing , 2019, Environ. Model. Softw..

[17]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[18]  Anthony J. Jakeman,et al.  Ten iterative steps in development and evaluation of environmental models , 2006, Environ. Model. Softw..

[19]  Jano I. van Hemert,et al.  Scientific Workflows , 2016, ACM Comput. Surv..

[20]  Andrea Marchetti,et al.  Linked Data Maps: Providing a Visual Entry Point for the Exploration of Datasets , 2015, IESD@ISWC.

[21]  Peng Yue,et al.  Advancing interoperability of geospatial data provenance on the web: Gap analysis and strategies , 2018, Comput. Geosci..

[22]  Ian Foster,et al.  Special Issue: The First Provenance Challenge , 2008 .

[23]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[24]  Peng Yue,et al.  Model provenance tracking and inference for integrated environmental modelling , 2017, Environ. Model. Softw..

[25]  Abdul Waheed,et al.  Provenance Inference Techniques: Taxonomy, comparative analysis and design challenges , 2018, J. Netw. Comput. Appl..

[26]  Pip Wallace,et al.  Building a reliable evidence base: Legal challenges in environmental decision-making call for a more rigorous adoption of best practices in environmental modelling , 2018 .

[27]  Mauno Rönkkö,et al.  Provenance in Systems for Situation Awareness in Environmental Monitoring , 2015, ISESS.

[28]  A. Ausseil,et al.  Development of a New Zealand SedNet model for assessment of catchment-wide soil-conservation works , 2016 .

[29]  J. Dymond,et al.  Exploring limits and trade-offs of irrigation and agricultural intensification in the Ruamahanga catchment, New Zealand , 2016 .

[30]  Land use and land management practices: Concepts, terms and classification principles , 2004 .

[31]  Vasa Curcin,et al.  Embedding data provenance into the Learning Health System to facilitate reproducible research , 2016, Learning health systems.

[32]  Krzysztof Z. Gajos,et al.  Evaluation of Filesystem Provenance Visualization Tools , 2013, IEEE Transactions on Visualization and Computer Graphics.

[33]  Luc Moreau,et al.  The Foundations for Provenance on the Web , 2010, Found. Trends Web Sci..

[34]  D. Lanter Design of a Lineage-Based Meta-Data Base for GIS , 1991 .

[35]  Michael Obersteiner,et al.  Knowing sufficient and applying more: challenges in hazards management , 2002 .

[36]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[37]  Stefan Kienberger,et al.  The Disaster-Knowledge Matrix – Reframing and evaluating the knowledge challenges in disaster risk reduction , 2015 .

[38]  Luc Moreau,et al.  ProvStore: A Public Provenance Repository , 2014, IPAW.

[39]  Sudha Ram,et al.  A New Perspective on Semantics of Data Provenance , 2009, SWPM.

[40]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[41]  Thomas Maxwell,et al.  Comparing modelling frameworks - A workshop approach , 2006, Environ. Model. Softw..

[42]  Trung Thanh Nguyen,et al.  Assessing resource-use efficiency of land use , 2018, Environ. Model. Softw..

[43]  Yuanzheng Shao,et al.  Implementation of Geospatial Data Provenance in a Web Service Workflow Environment With ISO 19115 and ISO 19115-2 Lineage Model , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[44]  A. Ausseil,et al.  Spatial optimisation of ecosystem services. , 2013 .

[45]  Sudha Ram,et al.  A Semiotics Framework for Analyzing Data Provenance Research , 2008, J. Comput. Sci. Eng..

[46]  Margo I. Seltzer,et al.  Provenance Map Orbiter: Interactive Exploration of Large Provenance Graphs , 2011, TaPP.

[47]  Mingfang Wu,et al.  Provenance in support of ANDS' four transformations , 2016 .

[48]  Holger Stitz,et al.  AVOCADO: Visualization of Workflow–Derived Data Provenance for Reproducible Biomedical Research , 2016, bioRxiv.

[49]  Andrew P. Davison Automated Capture of Experiment Context for Easier Reproducibility in Computational Research , 2012, Computing in Science & Engineering.

[50]  P. Ridd,et al.  The need for a formalised system of Quality Control for environmental policy-science. , 2018, Marine pollution bulletin.

[51]  John R. Dymond,et al.  Assessment of multiple ecosystem services in New Zealand at the catchment scale , 2013, Environ. Model. Softw..

[52]  José Maria N. David,et al.  A Framework for Provenance Analysis and Visualization , 2017, ICCS.

[53]  Matthew J. Turk,et al.  Computing Environments for Reproducibility: Capturing the "Whole Tale" , 2018, Future Gener. Comput. Syst..

[54]  R. Fulweiler,et al.  Reconsidering Ocean Calamities , 2015 .

[55]  Barbara Lerner,et al.  RDataTracker: Collecting Provenance in an Interactive Scripting Environment , 2014, TAPP.

[56]  Paolo Missier,et al.  Linking multiple workflow provenance traces for interoperable collaborative science , 2010, The 5th Workshop on Workflows in Support of Large-Scale Science.

[57]  Jürgen Mittelstraß The loss of knowledge in the information age , 2010 .