Towards FAIR protocols and workflows: The OpenPREDICT case study

It is essential for the advancement of science that scientists and researchers share, reuse and reproduce workflows and protocols used by others. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize a number of important points regarding the means by which digital objects are found and reused by others. The question of how to apply these principles not just to the static input and output data but also to the dynamic workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe our inclusive and overarching approach to apply the FAIR principles to workflows and protocols and demonstrate its benefits. We apply and evaluate our approach on a case study that consists of making the PREDICT workflow, a highly cited drug repurposing workflow, open and FAIR. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. A semantic model was proposed to better address these specific requirements and were evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.

[1]  Erik Schultes,et al.  A design framework and exemplar metrics for FAIRness , 2017 .

[2]  Reginald B. Adams,et al.  Investigating Variation in Replicability: A “Many Labs” Replication Project , 2014 .

[3]  Alexander A. Morgan,et al.  Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data , 2011, Science Translational Medicine.

[4]  J. Scannell,et al.  Diagnosing the decline in pharmaceutical R&D efficiency , 2012, Nature Reviews Drug Discovery.

[5]  Alfonso E. Romero,et al.  A network medicine approach to quantify distance between hereditary disease modules on the interactome , 2015, Scientific Reports.

[6]  Steffen Staab,et al.  Process Refinement Validation and Explanation with Ontology Reasoning , 2013, ICSOC.

[7]  John Mylopoulos,et al.  Goal-oriented requirements engineering: an extended systematic mapping study , 2017, Requirements Engineering.

[8]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[9]  Carole Goble,et al.  Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv , 2019, GigaScience.

[10]  R. Sharan,et al.  PREDICT: a method for inferring novel drug indications with application to personalized medicine , 2011, Molecular systems biology.

[11]  Ron Edgar,et al.  Gene Expression Omnibus ( GEO ) : Microarray data storage , submission , retrieval , and analysis , 2008 .

[12]  Natalya F. Noy,et al.  BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse , 2009 .

[13]  Carole A. Goble,et al.  Using a suite of ontologies for preserving workflow-centric research objects , 2015, J. Web Semant..

[14]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[15]  Chao Wu,et al.  Computational drug repositioning through heterogeneous network clustering , 2013, BMC Systems Biology.

[16]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[17]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[18]  J. Brooks Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece , 2008 .

[19]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement , 2009, BMJ.

[20]  Barend Mons,et al.  Data Stewardship for Open Science: Implementing FAIR Principles , 2018 .

[21]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[22]  A. Barabasi,et al.  Uncovering disease-disease relationships through the incomplete interactome , 2015, Science.

[23]  Carole A. Goble,et al.  Best Practices for Workflow Design: How to Prevent Workflow Decay , 2012, SWAT4LS.

[24]  Nicole A. Vasilevsky,et al.  On the reproducibility of science: unique identification of research resources in the biomedical literature , 2013, PeerJ.

[25]  D. S. Katz,et al.  FAIR enough? Can we (already) benefit from applying the FAIR Data Principles to software? , 2018 .

[26]  Yufeng Zhang,et al.  A novel flexible activity refinement approach for improving workflow process flexibility , 2016, Comput. Ind..

[27]  Gwen Salaün,et al.  Checking Business Process Evolution , 2016, FACS.

[28]  R. Poldrack,et al.  The publication and reproducibility challenges of shared data , 2015, Trends in Cognitive Sciences.

[29]  Michael Rosemann,et al.  Multi-Paradigm Process Management , 2004, CAiSE Workshops.

[30]  Juliana Freire,et al.  A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[31]  Michael Rosemann,et al.  The Six Core Elements of Business Process Management , 2015, Handbook on Business Process Management.

[32]  J. Ioannidis Contradicted and Initially Stronger Effects in Highly Cited Clinical Research , 2005 .

[33]  Luiz Olavo Bonino da Silva Santos,et al.  A Generic Workflow for the Data FAIRification Process , 2020, Data Intelligence.

[34]  Michel Dumontier,et al.  Ontology-Based Querying with Bio2RDF’s Linked Open Data , 2013, Journal of Biomedical Semantics.

[35]  T. Ashburn,et al.  Drug repositioning: identifying and developing new uses for existing drugs , 2004, Nature Reviews Drug Discovery.

[36]  Marta Mattoso,et al.  A Foundational Ontology to Support Scientific Experiments , 2012, ONTOBRAS-MOST.

[37]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[38]  Nigel W. Hardy,et al.  Mouse model phenotypes provide information about human drug targets , 2013, Bioinform..

[39]  S. Sleigh,et al.  Repurposing Strategies for Therapeutics , 2010, Pharmaceutical Medicine.

[40]  砂田 憲吾,et al.  Bridging the gap between , 2009 .

[41]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[42]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[43]  Xiaowei Xu,et al.  A phenome-guided drug repositioning through a latent variable model , 2014, BMC Bioinformatics.