Enabling precision medicine via standard communication of HTS provenance, analysis, and results

A personalized approach based on a patient’s or pathogen’s unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to FAIR guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet lab procedures to computational methods. The BioCompute framework (https://osf.io/zm97b/) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCO) offer that standard, and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the “Open-Stand.org principles for collaborative open standards development”. By communication of high-throughput sequencing studies using a BCO, regulatory agencies (e.g., FDA), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next generation sequencing workflow exchange, reporting, and regulatory reviews.

[1]  Yolanda Gil,et al.  Abstract, link, publish, exploit: An end to end framework for workflow sharing , 2017, Future Gener. Comput. Syst..

[2]  Johan Montagnat,et al.  Scientific workflows: Past, present and future , 2017, Future Gener. Comput. Syst..

[3]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[4]  C. Whitty,et al.  The contribution of biological, mathematical, clinical, engineering and social sciences to combatting the West African Ebola epidemic , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[5]  Lucila Ohno-Machado,et al.  DATS, the data tag suite to enable discoverability of datasets , 2017, Scientific Data.

[6]  Raja Mazumder,et al.  Biocompute Objects—A Step towards Evaluation and Validation of Biomedical Scientific Computations , 2016, PDA Journal of Pharmaceutical Science and Technology.

[7]  Jeremy Leipzig,et al.  A review of bioinformatic pipeline frameworks , 2016, Briefings Bioinform..

[8]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[9]  Arthur W. Toga,et al.  I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[10]  Gaurav Kaushik,et al.  Rabix: an open-source workflow executor supporting recomputability and interoperability of workflow descriptions , 2016, bioRxiv.

[11]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[12]  John P. A. Ioannidis,et al.  What does research reproducibility mean? , 2016, Science Translational Medicine.

[13]  Allyson L. Lister,et al.  BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences , 2016, Database J. Biol. Databases Curation.

[14]  Rachel G Liao,et al.  Facilitating a culture of responsible and effective sharing of cancer genome data , 2016, Nature Medicine.

[15]  Tom R. Gaunt,et al.  LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis , 2016, bioRxiv.

[16]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[17]  Weida Tong,et al.  The FDA’s Experience with Emerging Genomics Technologies—Past, Present, and Future , 2016, The AAPS Journal.

[18]  Luis V. Santana-Quintero,et al.  High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis , 2016, Database J. Biol. Databases Curation.

[19]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[20]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[21]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[22]  Ioanna Chouvarda,et al.  A reusable ontology for primitive and complex HL7 FHIR data types , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[23]  Gil Alterovitz,et al.  All the World's a Stage: Facilitating Discovery Science and Improved Cancer Care through the Global Alliance for Genomics and Health. , 2015, Cancer discovery.

[24]  Marco Schito,et al.  Collaborative Effort for a Centralized Worldwide Tuberculosis Relational Sequencing Data Platform. , 2015, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[25]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[26]  Ola Spjuth,et al.  Experiences with workflows for automating data-intensive bioinformatics , 2015, Biology Direct.

[27]  Gil Alterovitz,et al.  SMART on FHIR Genomics: facilitating standardized clinico-genomic apps , 2015, J. Am. Medical Informatics Assoc..

[28]  M. S. Avila-Garcia,et al.  From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics , 2015, PloS one.

[29]  Carole A. Goble,et al.  Using a suite of ontologies for preserving workflow-centric research objects , 2015, J. Web Semant..

[30]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[31]  Raja Mazumder,et al.  High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis , 2014, Genes.

[32]  A. Kenall,et al.  An open future for ecological and evolutionary data? , 2014, BMC Evolutionary Biology.

[33]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[34]  Carole A. Goble,et al.  Structuring research methods and data with the research object model: genomics workflows as a case study , 2013, Journal of Biomedical Semantics.

[35]  Christos Hatzis,et al.  Reproducibility of research and preclinical validation: problems and solutions , 2013, Nature Reviews Clinical Oncology.

[36]  Yolanda Gil,et al.  Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome , 2013, PloS one.

[37]  Erika Check Hayden,et al.  Geneticists push for global data-sharing , 2013, Nature.

[38]  Stian Soiland-Reyes,et al.  PAV ontology: provenance, authoring and versioning , 2013, J. Biomed. Semant..

[39]  G. Omenn,et al.  Evolution of Translational Omics: Lessons Learned and the Path Forward , 2013 .

[40]  Carole A. Goble,et al.  Enhancing and abstracting scientific workflow provenance for data publishing , 2013, EDBT '13.

[41]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[42]  Scott D Boyd,et al.  Diagnostic applications of high-throughput DNA sequencing. , 2013, Annual review of pathology.

[43]  Kevin R. Page,et al.  From Workflows to Research Objects: An Architecture for Preserving the Semantics of Science , 2012, LISC@ISWC.

[44]  Philippe Bonnet,et al.  Computational reproducibility: state-of-the-art, challenges, and database research opportunities , 2012, SIGMOD Conference.

[45]  Christine M. Micheel,et al.  COMMITTEE ON THE REVIEW OF OMICS-BASED TESTS FOR PREDICTING PATIENT OUTCOMES IN CLINICAL TRIALS , 2012 .

[46]  Elizabeth M Glass,et al.  From genomics to metagenomics. , 2012, Current opinion in biotechnology.

[47]  P. Chain,et al.  Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. , 2012, Current opinion in biotechnology.

[48]  David E. Robbins,et al.  ImageJS: Personalized, participated, pervasive, and reproducible image bioinformatics in the web browser , 2012, Journal of pathology informatics.

[49]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[50]  Anton Nekrutenko,et al.  Harnessing cloud computing with Galaxy Cloud , 2011, Nature Biotechnology.

[51]  Carole A. Goble,et al.  Towards the Preservation of Scientific Workflows , 2011, iPRES.

[52]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[53]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[54]  Eugenie Samuel Reich,et al.  Cancer trial errors revealed , 2011, Nature.

[55]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[56]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[57]  Akira R. Kinjo,et al.  The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium* , 2010, J. Biomed. Semant..

[58]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[59]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[60]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[61]  H. Willard,et al.  Genomic and personalized medicine: foundations and applications. , 2009, Translational research : the journal of laboratory and clinical medicine.

[62]  N. Hawkins,et al.  Data sharing in genomics — re-shaping scientific practice , 2009, Nature Reviews Genetics.

[63]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[64]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[65]  Janet Woodcock,et al.  The FDA critical path initiative and its influence on new drug development. , 2008, Annual review of medicine.

[66]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[67]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[68]  K. Kjer,et al.  Opinions on multiple sequence alignment, and an empirical comparison of repeatability and accuracy between POY and structural alignment. , 2007, Systematic biology.

[69]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[70]  Hilmar Lapp,et al.  Open source tools and toolkits for bioinformatics: significance, and where are we? , 2006, Briefings Bioinform..

[71]  William K. Michener,et al.  Meta-information concepts for ecological data management , 2006, Ecol. Informatics.

[72]  José Francisco Aldana Montes,et al.  Intelligent client for integrating bioinformatics services , 2006, Bioinform..

[73]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[74]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[75]  Weida Tong,et al.  ArrayTrack--supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. , 2003, Environmental health perspectives.

[76]  Mark D. Wilkinson,et al.  BioMOBY: An Open Source Biological Web Services Proposal , 2002, Briefings Bioinform..

[77]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[78]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[79]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[80]  Haq Mm,et al.  Medical genetics and the Human Genome Project: a historical review. , 1993 .

[81]  R A Deyo,et al.  Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. , 1991, Controlled clinical trials.