Use of semantic workflows to enhance transparency and reproducibility in clinical omics

BackgroundRecent highly publicized cases of premature patient assignment into clinical trials, resulting from non-reproducible omics analyses, have prompted many to call for a more thorough examination of translational omics and highlighted the critical need for transparency and reproducibility to ensure patient safety. The use of workflow platforms such as Galaxy and Taverna have greatly enhanced the use, transparency and reproducibility of omics analysis pipelines in the research domain and would be an invaluable tool in a clinical setting. However, the use of these workflow platforms requires deep domain expertise that, particularly within the multi-disciplinary fields of translational and clinical omics, may not always be present in a clinical setting. This lack of domain expertise may put patient safety at risk and make these workflow platforms difficult to operationalize in a clinical setting. In contrast, semantic workflows are a different class of workflow platform where resultant workflow runs are transparent, reproducible, and semantically validated. Through semantic enforcement of all datasets, analyses and user-defined rules/constraints, users are guided through each workflow run, enhancing analytical validity and patient safety.MethodsTo evaluate the effectiveness of semantic workflows within translational and clinical omics, we have implemented a clinical omics pipeline for annotating DNA sequence variants identified through next generation sequencing using the Workflow Instance Generation and Specialization (WINGS) semantic workflow platform.ResultsWe found that the implementation and execution of our clinical omics pipeline in a semantic workflow helped us to meet the requirements for enhanced transparency, reproducibility and analytical validity recommended for clinical omics. We further found that many features of the WINGS platform were particularly primed to help support the critical needs of clinical omics analyses.ConclusionsThis is the first implementation and execution of a clinical omics pipeline using semantic workflows. Evaluation of this implementation provides guidance for their use in both translational and clinical settings.

[1]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[2]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[3]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[4]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[5]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[6]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[7]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[8]  Nenad Medvidovic,et al.  A software architecture-based framework for highly distributed and data intensive scientific applications , 2006, ICSE.

[9]  Ching-Hon Pui,et al.  Pharmacogenetics in Childhood Acute Lymphoblastic Leukemia , 2006 .

[10]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[11]  Yolanda Gil,et al.  Wings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows , 2007, AAAI.

[12]  David F Ransohoff,et al.  The process to discover and develop biomarkers for cancer: a work in progress. , 2008, Journal of the National Cancer Institute.

[13]  Andreas Prlic,et al.  The Protein Feature Ontology: a tool for the unification of protein feature annotations , 2008, Bioinform..

[14]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[15]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[16]  Arthur W. Toga,et al.  Effi cient , distributed and interactive neuroimaging data analysis using the LONI Pipeline , 2009 .

[17]  K. Coombes,et al.  Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology , 2009, 1010.1092.

[18]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  David F Ransohoff,et al.  Promises and limitations of biomarkers. , 2009, Recent results in cancer research. Fortschritte der Krebsforschung. Progres dans les recherches sur le cancer.

[21]  Michael L. Hines,et al.  Neuroinformatics Original Research Article Neuron and Python , 2022 .

[22]  Xosé M Fernández-Suárez,et al.  Using the Ensembl Genome Server to Browse Genomic Sequence Data , 2010, Current protocols in bioinformatics.

[23]  Catherine Brooksbank,et al.  The European Bioinformatics Institute’s data resources , 2009, Nucleic Acids Res..

[24]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[25]  Zhiyong Lu,et al.  Database resources of the National Center for Biotechnology Information , 2010, Nucleic Acids Res..

[26]  M. Bhagwat,et al.  Searching NCBI's dbSNP Database , 2010, Current protocols in bioinformatics.

[27]  Paul T. Groth,et al.  Wings: Intelligent Workflow-Based Design of Computational Experiments , 2011, IEEE Intelligent Systems.

[28]  Yolanda Gil,et al.  A new approach for publishing workflows: abstractions, standards, and linked data , 2011, WORKS '11.

[29]  Cláudio T. Silva,et al.  CrowdLabs: Social Analysis and Visualization for the Sciences , 2011, SSDBM.

[30]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[31]  Karen Eilbeck,et al.  Evolution of the Sequence Ontology terms and relationships , 2009, J. Biomed. Informatics.

[32]  Mark D. Wilkinson,et al.  The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation , 2011, J. Biomed. Semant..

[33]  Yolanda Gil,et al.  A semantic framework for automatic generation of computational workflows using distributed data and component catalogues , 2011, J. Exp. Theor. Artif. Intell..

[34]  K. Coombes,et al.  What information should be required to support clinical "omics" publications? , 2011, Clinical chemistry.

[35]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[36]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[37]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[38]  Yolanda Gil,et al.  Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data , 2012, LISC@ISWC.

[39]  Christine M. Micheel,et al.  COMMITTEE ON THE REVIEW OF OMICS-BASED TESTS FOR PREDICTING PATIENT OUTCOMES IN CLINICAL TRIALS , 2012 .

[40]  Hugo Y. K. Lam,et al.  Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes , 2012, Cell.

[41]  Thomas Kislinger,et al.  Novel approaches for the identification of biomarkers of aggressive prostate cancer , 2013, Genome Medicine.

[42]  H. Li,et al.  Cometabolism of Microbes and Host: Implications for Drug Metabolism and Drug‐Induced Toxicity , 2013, Clinical pharmacology and therapeutics.

[43]  May D. Wang,et al.  Assessing the impact of human genome annotation choice on RNA-seq expression estimates , 2013, BMC Bioinformatics.

[44]  Yolanda Gil,et al.  Time-bound analytic tasks on large datasets through dynamic configuration of workflows , 2013, WORKS@SC.

[45]  Xiangdong Fang,et al.  A Brief Review on the Human Encyclopedia of DNA Elements (ENCODE) Project , 2013, Genom. Proteom. Bioinform..

[46]  Yolanda Gil Mapping Semantic Workflows to Alternative Workflow Execution Engines , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[47]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[48]  Yolanda Gil,et al.  Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome , 2013, PloS one.

[49]  L. James,et al.  Metabolomics: Integration of a New “Omics” with Clinical Pharmacology , 2013, Clinical pharmacology and therapeutics.

[50]  G. Omenn,et al.  Evolution of Translational Omics: Lessons Learned and the Path Forward , 2013 .

[51]  François Ducray,et al.  Predictive biomarkers in adult gliomas: the present and the future , 2013, Current opinion in oncology.

[52]  F. Collins,et al.  First FDA authorization for next-generation sequencer. , 2013, The New England journal of medicine.

[53]  Shannon McWeeney,et al.  Using semantic workflows to disseminate best practices and accelerate discoveries in multi-omic data analysis , 2013, AAAI 2013.

[54]  C. Perou,et al.  The genomic landscape of breast cancer as a therapeutic roadmap. , 2013, Cancer discovery.

[55]  Nicole A. Vasilevsky,et al.  On the reproducibility of science: unique identification of research resources in the biomedical literature , 2013, PeerJ.

[56]  A. Lymperopoulos,et al.  Pharmacogenomics of heart failure. , 2014, Methods in molecular biology.

[57]  Rolf Apweiler,et al.  The European Bioinformatics Institute’s data resources 2014 , 2013, Nucleic Acids Res..

[58]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[59]  James E. Johnson,et al.  Flexible and Accessible Workflows for Improved Proteogenomic Analysis Using the Galaxy Framework , 2014, Journal of proteome research.

[60]  E. Saracchi,et al.  Emerging candidate biomarkers for Parkinson's disease: a review. , 2014, Aging and disease.

[61]  Yolanda Gil Intelligent Workflow Systems and Provenance-Aware Software , 2014 .

[62]  Paul Shannon,et al.  VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants , 2014, Bioinform..

[63]  Ian Tomlinson,et al.  'Toxgnostics': an unmet need in cancer medicine , 2014, Nature Reviews Cancer.

[64]  D. Müller,et al.  Pharmacogenetics of antipsychotic treatment in schizophrenia. , 2014, Methods in molecular biology.

[65]  Yolanda Gil,et al.  Towards Workflow Ecosystems through Semantic and Standard Representations , 2014, 2014 9th Workshop on Workflows in Support of Large-Scale Science.

[66]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[67]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[68]  Paul M. Thompson,et al.  FragFlow Automated Fragment Detection in Scientific Workflows , 2014, 2014 IEEE 10th International Conference on e-Science.

[69]  W. Anderson Reproducibility: Stamp out shabby research conduct , 2015, Nature.

[70]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.