User-friendly Composition of FAIR Workflows in a Notebook Environment

There has been a large focus in recent years on making assets in scientific research findable, accessible, interoperable and reusable, collectively known as the FAIR principles. A particular area of focus lies in applying these principles to scientific computational workflows. Jupyter notebooks are a very popular medium by which to program and communicate computational scientific analyses. However, they present unique challenges when it comes to reuse of only particular steps of an analysis without disrupting the usual flow and benefits of the notebook approach, making it difficult to fully comply with the FAIR principles. Here we present an approach and toolset for adding the power of semantic technologies to Python-encoded scientific workflows in a simple, automated and minimally intrusive manner. The semantic descriptions are published as a series of nanopublications that can be searched and used in other notebooks by means of a Jupyter Lab plugin. We describe the implementation of the proposed approach and toolset, and provide the results of a user study with 15 participants, designed around image processing workflows, to evaluate the usability of the system and its perceived effect on FAIRness. Our results show that our approach is feasible and perceived as user-friendly. Our system received an overall score of 78.75 on the System Usability Scale, which is above the average score reported in the literature.

[1]  Jun Qin,et al.  Scientific Workflows , 2012, Springer Berlin Heidelberg.

[2]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[3]  Michael Krauthammer,et al.  Decentralized provenance-aware publishing with nanopublications , 2016, PeerJ Comput. Sci..

[4]  Paul T. Groth,et al.  Perspectives on automated composition of workflows in the life sciences , 2021, F1000Research.

[5]  Carole A. Goble,et al.  Using a suite of ontologies for preserving workflow-centric research objects , 2015, J. Web Semant..

[6]  Alfonso Valencia,et al.  Towards FAIR principles for research software , 2020, Data Sci..

[7]  Juliana Freire,et al.  Collecting and Analyzing Provenance on Interactive Notebooks: When IPython Meets noWorkflow , 2015, TaPP.

[8]  Juliana Freire,et al.  noWorkflow: Capturing and Analyzing Provenance of Scripts , 2014, IPAW.

[9]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[10]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[11]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[12]  Yolanda Gil,et al.  FAIR Computational Workflows , 2020, Data Intelligence.

[13]  Ilkay Altintas,et al.  Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks , 2019, PLoS Comput. Biol..

[14]  Michael Krauthammer,et al.  Broadening the Scope of Nanopublications , 2013, ESWC.

[15]  Michel Dumontier,et al.  Towards FAIR protocols and workflows: the OpenPREDICT use case , 2020, PeerJ Comput. Sci..

[16]  Juliana Freire,et al.  A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[17]  Carole Goble,et al.  RO-Crate, a lightweight approach to Research Object data packaging , 2019, RO.

[18]  Michel Dumontier,et al.  Semantic micro-contributions with decentralized nanopublication services , 2021, PeerJ Comput. Sci..

[19]  James T. Miller,et al.  An Empirical Evaluation of the System Usability Scale , 2008, Int. J. Hum. Comput. Interact..

[20]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[21]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[22]  Pascal Hitzler,et al.  A review of the semantic web field , 2021, Commun. ACM.

[23]  Pjotr Prins,et al.  Scalable Workflows and Reproducible Data Analysis for Genomics , 2019, Methods in molecular biology.

[24]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.