Toward open and reproducible environmental modeling by integrating online data repositories, computational environments, and model Application Programming Interfaces

Abstract Cyberinfrastructure needs to be advanced to enable open and reproducible environmental modeling research. Recent efforts toward this goal have focused on advancing online repositories for data and model sharing, online computational environments along with containerization technology and notebooks for capturing reproducible computational studies, and Application Programming Interfaces (APIs) for simulation models to foster intuitive programmatic control. The objective of this research is to show how these efforts can be integrated to support reproducible environmental modeling. We present first the high-level concept and general approach for integrating these three components. We then present one possible implementation that integrates HydroShare (an online repository), CUAHSI JupyterHub and CyberGIS-Jupyter for Water (computational environments), and pySUMMA (a model API) to support open and reproducible hydrologic modeling. We apply the example implementation for a hydrologic modeling use case to demonstrate how the approach can advance reproducible environmental modeling through the seamless integration of cyberinfrastructure services.

[1]  Victoria Stodden,et al.  Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research , 2014 .

[2]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[3]  Matthew J. Turk,et al.  Computing Environments for Reproducibility: Capturing the "Whole Tale" , 2018, Future Gener. Comput. Syst..

[4]  Günter Blöschl,et al.  Uncertainty and multiple objective calibration in regional water balance modelling: case study in 320 Austrian catchments , 2007 .

[5]  Christopher J. Duffy,et al.  Essential Terrestrial Variable data workflows for distributed water resources modeling , 2013, Environ. Model. Softw..

[6]  Anthony M. Castronova,et al.  Enabling Collaborative Numerical Modeling in Earth Sciences using Knowledge Infrastructure , 2019, Environ. Model. Softw..

[7]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[8]  Jeffery S. Horsburgh,et al.  Design of a metadata framework for environmental models with an example hydrologic application in HydroShare , 2017, Environ. Model. Softw..

[9]  Abdou Khouakhi,et al.  Using R in hydrology: a review of recent developments and future directions , 2019, Hydrology and Earth System Sciences.

[10]  Christopher J. Duffy,et al.  Visualization workflows for level-12 HUC scales: Towards an expert system for watershed analysis in a distributed computing environment , 2016, Environ. Model. Softw..

[11]  Tammo S. Steenhuis,et al.  SWATmodel: A Multi‐Operating System, Multi‐Platform SWAT Model Package in R , 2014 .

[12]  Martin Reddy,et al.  API Design for C , 2011 .

[13]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[14]  Lawrence E. Band,et al.  Ecohydrology Models without Borders? - Using Geospatial Web Services in EcohydroLib Workflows in the United States and Australia , 2015, ISESS.

[15]  Jeffery S. Horsburgh,et al.  HydroShare: Sharing Diverse Environmental Data Types and Models as Social Objects with Application to the Hydrology Domain , 2016 .

[16]  John M. Volk,et al.  PRMS-Python: A Python framework for programmatic PRMS modeling and access to its data structures , 2019, Environ. Model. Softw..

[17]  James H Stagge,et al.  Assessing data availability and research reproducibility in hydrology and water resources , 2019, Scientific Data.

[18]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[19]  Division on Earth,et al.  Reproducibility and Replicability in Science , 2019 .

[20]  Dmitri Kavetski,et al.  A unified approach for process‐based hydrologic modeling: 2. Model implementation and case studies , 2015 .

[21]  M. Wu,et al.  Development of an open-source software package for watershed modeling with the Hydrological Simulation Program in Fortran , 2015, Environ. Model. Softw..

[22]  Göran Lindström,et al.  Virtual laboratories: new opportunities for collaborative water science , 2014, Hydrology and Earth System Sciences.

[23]  P. Jarvis The Interpretation of the Variations in Leaf Water Potential and Stomatal Conductance Found in Canopies in the Field , 1976 .

[24]  Rolf Backofen,et al.  Practical computational reproducibility in the life sciences , 2017, bioRxiv.

[25]  Mohamed M. Morsy,et al.  Integrating scientific cyberinfrastructures to improve reproducibility in computational hydrology: Example for HydroShare and GeoTrust , 2018, Environ. Model. Softw..

[26]  Dmitri Kavetski,et al.  A unified approach for process‐based hydrologic modeling: 1. Modeling concept , 2015 .

[27]  Suzanne A. Pierce,et al.  Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance , 2016 .

[28]  Radovan Bast,et al.  A FAIRer future , 2019, Nature Physics.

[29]  Tanu Malik,et al.  Utilizing Provenance in Reusable Research Objects , 2018, Informatics.

[30]  Rolf Hut,et al.  Let hydrologists learn the latest computer science by working with Research Software Engineers (RSEs) and not reinvent the waterwheel ourselves. A comment to “Most Computational Hydrology is not Reproducible, so is it Really Science?” , 2017 .

[31]  Shaowen Wang,et al.  Reproducible Hydrological Modeling with CyberGIS-Jupyter: A Case Study on SUMMA , 2019, PEARC.

[32]  Dharhas Pothina,et al.  Analysis and Visualization of Coastal Ocean Model Data in the Cloud , 2019, Journal of Marine Science and Engineering.

[33]  Tim Head,et al.  Binder 2.0 - Reproducible, interactive, sharable environments for science at scale , 2018, SciPy.

[34]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[35]  Shaowen Wang,et al.  A CyberGIS-Jupyter Framework for Geospatial Analytics at Scale , 2017, PEARC.

[36]  D. Tarboton A new method for the determination of flow directions and upslope areas in grid digital elevation models , 1997 .

[37]  LeonardLorne,et al.  Automating data-model workflows at a level 12 HUC scale , 2014 .

[38]  Joseph Hamman,et al.  The PANGEO Big Data Ecosystem and its use at CNES , 2019 .

[39]  Lorne Leonard HydroTerre: Towards an expert system for scaling hydrological data and models from hill-slopes to major-river basins , 2015 .

[40]  Nicole M. Gasparini,et al.  Creative computing with Landlab: an open-source toolkit for building, coupling, and exploring two-dimensional numerical models of Earth-surface dynamics , 2016 .

[41]  M Bakker,et al.  Scripting MODFLOW Model Development Using Python and FloPy , 2016, Ground water.

[42]  C. Müller,et al.  Modelling the role of agriculture for the 20th century global terrestrial carbon balance , 2007 .

[43]  Alain Pietroniro,et al.  Grouped Response Units for Distributed Hydrologic Modeling , 1993 .

[44]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[45]  Christopher J. Duffy,et al.  Automating data-model workflows at a level 12 HUC scale: Watershed modeling in a distributed computing environment , 2014, Environ. Model. Softw..

[46]  Christina L. Tague,et al.  RHESSys: Regional Hydro-Ecologic Simulation System—An Object- Oriented Approach to Spatially Distributed Modeling of Carbon, Water, and Nutrient Cycling , 2004 .

[47]  I. E. Woodrow,et al.  A Model Predicting Stomatal Conductance and its Contribution to the Control of Photosynthesis under Different Environmental Conditions , 1987 .

[48]  Marcia McNutt,et al.  Journals unite for reproducibility , 2014, Science.

[49]  Alva L. Couch,et al.  HydroShare: Advancing Collaboration through Hydrologic Data and Model Sharing , 2015 .

[50]  Rolf Hut,et al.  Comment on “Most computational hydrology is not reproducible, so is it really science?” by Christopher Hutton et al.: Let hydrologists learn the latest computer science by working with Research Software Engineers (RSEs) and not reinvent the waterwheel ourselves , 2017 .

[51]  Anthony M. Castronova,et al.  Development of a participatory Green Infrastructure design, visualization and evaluation system in a cloud supported jupyter notebook computing environment , 2019, Environ. Model. Softw..

[52]  Christopher Hutton,et al.  Most computational hydrology is not reproducible, so is it really science? , 2016, Water Resources Research.