Development of a cloud-based platform for reproducible science: A case study of an IUCN Red List of Ecosystems Assessment

Abstract One of the challenges of computational-centric research is to make the research undertaken reproducible in a form that others can repeat and re-use with minimal effort. In addition to the data and tools necessary to re-run analyses, execution environments play crucial roles because of the dependencies of the operating system and software version used. However, some of the challenges of reproducible science can be addressed using appropriate computational tools and cloud computing to provide an execution environment. Here, we demonstrate the use of a Kepler scientific workflow for reproducible science that is sharable, reusable, and re-executable. These workflows reduce barriers to sharing and will save researchers time when undertaking similar research in the future. To provide infrastructure that enables reproducible science, we have developed cloud-based Collaborative Environment for Ecosystem Science Research and Analysis (CoESRA) infrastructure to build, execute and share sophisticated computation-centric research. The CoESRA provides users with a storage and computational platform that is accessible from a web-browser in the form of a virtual desktop. Any registered user can access the virtual desktop to build, execute and share the Kepler workflows. This approach will enable computational scientists to share complete workflows in a pre-configured environment so that others can reproduce the computational research with minimal effort. As a case study, we developed and shared a complete IUCN Red List of Ecosystems Assessment workflow that reproduces the assessments undertaken by Burns et al. ( 2015 ) on Mountain Ash forests in the Central Highlands of Victoria, Australia. This workflow provides an opportunity for other researchers and stakeholders to run this assessment with minimal supervision. The workflow also enables researchers to re-evaluate the assessment when additional data becomes available. The assessment can be run in a CoESRA virtual desktop by opening a workflow in a Kepler user interface and pressing a “start” button. The workflow is pre-configured with all the open access datasets and writes results to a pre-configured folder.

[1]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[2]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[3]  Victoria Stodden,et al.  Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research , 2014 .

[4]  D. Ashton The Seasonal Growth of Eucalyptus regnans F. Muell , 1975 .

[5]  I. Cockburn,et al.  The Economics of Reproducibility in Preclinical Research , 2015, PLoS biology.

[6]  Philippe Bonnet,et al.  Computational reproducibility: state-of-the-art, challenges, and database research opportunities , 2012, SIGMOD Conference.

[7]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[8]  Brian A. Nosek,et al.  Recommendations for Increasing Replicability in Psychology † , 2013 .

[9]  David B. Lindenmayer,et al.  Re-evaluation of forest biomass carbon stocks and lessons from the world's most carbon-dense forests , 2009, Proceedings of the National Academy of Sciences.

[10]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[11]  David Abramson,et al.  Nimrod/K: Towards massively parallel dynamic Grid workflows , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  David B. Lindenmayer,et al.  Newly discovered landscape traps produce regime shifts in wet forests , 2011, Proceedings of the National Academy of Sciences.

[13]  Arian Maleki,et al.  Reproducible Research in Computational Harmonic Analysis , 2009, Computing in Science & Engineering.

[14]  Paul Smith,et al.  Must try harder. , 1988, The Health service journal.

[15]  Kenneth M. Yamada,et al.  Reproducibility and cell biology , 2015, The Journal of cell biology.

[16]  Mark A. Burgman,et al.  Scientific Foundations for an IUCN Red List of Ecosystems , 2013, PloS one.

[17]  J Hilliard,et al.  Again and Again and Again , 2005 .

[18]  M. Spalding,et al.  A practical guide to the application of the IUCN Red List of Ecosystems criteria , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  David Abramson,et al.  A Reusable Scientific workflow for conservation Planning , 2015 .

[20]  Yolanda Gil,et al.  Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome , 2013, PloS one.

[21]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[22]  TIM M. BLACKBURN,et al.  Reproducibility and Repeatability in Ecology , 2006 .

[23]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[24]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[25]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[26]  Kurt D. Zeilenga Lightweight Directory Access Protocol (LDAP): Technical Specification Road Map , 2006, RFC.

[27]  James Loope Managing Infrastructure with Puppet , 2011 .

[28]  Ben Collen,et al.  Establishing IUCN Red List Criteria for Threatened Ecosystems , 2010, Conservation biology : the journal of the Society for Conservation Biology.

[29]  Domenico Talia,et al.  Workflow Systems for Science: Concepts and Tools , 2013 .

[30]  Jeffrey T. Leek,et al.  Opinion: Reproducible research can still be wrong: Adopting a prevention approach , 2015, Proceedings of the National Academy of Sciences.

[31]  D. Lindenmayer,et al.  Melbourne's Water Catchments: Perspectives on a World-Class Water Supply , 2013 .

[32]  Ian M. Mitchell,et al.  Best Practices for Scientific Computing , 2012, PLoS biology.

[33]  Brian A. Nosek,et al.  An open investigation of the reproducibility of cancer biology research , 2014, eLife.

[34]  D. Lindenmayer,et al.  Ecosystem assessment of mountain ash forest in the Central Highlands of Victoria, south‐eastern Australia , 2015 .

[35]  Jill P Mesirov,et al.  Accessible Reproducible Research , 2010, Science.

[36]  Amye Kenall,et al.  Better reporting for better research: a checklist for reproducibility , 2015, GigaScience.

[37]  Leighton Pritchard,et al.  Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology , 2013, PeerJ.

[38]  B. Jasny,et al.  Again, and Again, and Again … , 2011 .

[39]  Robert K. Abercrombie,et al.  A Computing Environment to Support Repeatable Scientific Big Data Experimentation of World-Wide Scientific Literature , 2015, ISSI.

[40]  Idafen Santana-Perez,et al.  Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach , 2015, Sci. Program..