Towards standardization guidelines for in silico approaches in personalized medicine

Abstract Despite the ever-progressing technological advances in producing data in health and clinical research, the generation of new knowledge for medical benefits through advanced analytics still lags behind its full potential. Reasons for this obstacle are the inherent heterogeneity of data sources and the lack of broadly accepted standards. Further hurdles are associated with legal and ethical issues surrounding the use of personal/patient data across disciplines and borders. Consequently, there is a need for broadly applicable standards compliant with legal and ethical regulations that allow interpretation of heterogeneous health data through in silico methodologies to advance personalized medicine. To tackle these standardization challenges, the Horizon2020 Coordinating and Support Action EU-STANDS4PM initiated an EU-wide mapping process to evaluate strategies for data integration and data-driven in silico modelling approaches to develop standards, recommendations and guidelines for personalized medicine. A first step towards this goal is a broad stakeholder consultation process initiated by an EU-STANDS4PM workshop at the annual COMBINE meeting (COMBINE 2019 workshop report in same issue). This forum analysed the status quo of data and model standards and reflected on possibilities as well as challenges for cross-domain data integration to facilitate in silico modelling approaches for personalized medicine.

[1]  Steven N Goodman,et al.  The research-treatment distinction: a problematic approach for determining which activities should have ethical oversight. , 2013, The Hastings Center report.

[2]  C. Begley,et al.  Reproducibility: Six red flags for suspect work , 2013, Nature.

[3]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[4]  Jacky L. Snoep,et al.  Reproducible computational biology experiments with SED-ML - The Simulation Experiment Description Markup Language , 2011, BMC Systems Biology.

[5]  Ricardo Henriques,et al.  Standard and Super-Resolution Bioimaging Data Analysis: A Primer , 2017 .

[6]  Rita Noumeir,et al.  The digital imaging and communications in medicine , 2011 .

[7]  Markus Hsi-Yang Fritz,et al.  Efficient storage of high throughput DNA sequencing data using reference-based compression. , 2011, Genome research.

[8]  Olaf Wolkenhauer,et al.  Enabling multiscale modeling in systems medicine , 2014, Genome Medicine.

[9]  Ingo Roeder,et al.  Whither systems medicine? , 2018, Experimental & Molecular Medicine.

[10]  Tudor Groza,et al.  The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2019, Nucleic Acids Res..

[11]  Rolf Backofen,et al.  Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers , 2017, PLoS Comput. Biol..

[12]  Arcadi Navarro,et al.  Leveraging European infrastructures to access 1 million human genomes by 2022 , 2019, Nature Reviews Genetics.

[13]  Chris J. Myers,et al.  Harmonizing semantic annotations for computational models in biology , 2018, bioRxiv.

[14]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[15]  R. Caporali,et al.  The Clinical Value of Autoantibodies in Rheumatoid Arthritis , 2018, Front. Med..

[16]  Daniel S. Katz,et al.  Enforcing public data archiving policies in academic publishing: A study of ecology journals , 2018, Big Data Soc..

[17]  Robert Uerpmann-Wittzack Convention on Human Rights and Biomedicine , 2017 .

[18]  Peter N. Robinson,et al.  Enabling Global Clinical Collaborations on Identifiable Patient Data: The Minerva Initiative , 2019, Front. Genet..

[19]  Catherine M Lloyd,et al.  CellML: its future, present and past. , 2004, Progress in biophysics and molecular biology.

[20]  Begley Cg,et al.  Ocean science: Arctic sea ice needs better forecasts , 2013, Nature.

[21]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[22]  Tudor Groza,et al.  The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species , 2016, bioRxiv.

[23]  Matthias König,et al.  Executable Simulation Model of the Liver , 2020, bioRxiv.

[24]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[25]  Gary D. Bader,et al.  Specifications of Standards in Systems and Synthetic Biology: Status and Developments in 2019 , 2019, J. Integr. Bioinform..

[26]  F. Scolari,et al.  A model to predict disease progression in patients with autosomal dominant polycystic kidney disease (ADPKD): the ADPKD Outcomes Model , 2018, BMC Nephrology.

[27]  U. Latza,et al.  [Guidelines and recommendations for ensuring Good Epidemiological Practice (GEP) -- revised version after evaluation]. , 2005, Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany)).

[28]  Chris J. Myers,et al.  The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 2 Core , 2018, J. Integr. Bioinform..

[29]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[30]  A. Nayarisseri,et al.  Machine learning models to predict the precise progression of Tay-Sachs and Related Disease , 2019, Proceedings of MOL2NET 2019, International Conference on Multidisciplinary Sciences, 5th edition.

[31]  Mikel Hernaez,et al.  An introduction to MPEG-G, the new ISO standard for genomic information representation , 2018 .

[32]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[33]  A Värri,et al.  A simple format for exchange of digitized polygraphic recordings. , 1992, Electroencephalography and clinical neurophysiology.

[34]  Gary D Bader,et al.  Specifications of Standards in Systems and Synthetic Biology: Status and Developments in 2016 , 2016, Journal of integrative bioinformatics.

[35]  Rui Zhao,et al.  Mathematical modeling identifies optimum lapatinib dosing schedules for the treatment of glioblastoma patients , 2018, PLoS Comput. Biol..

[36]  Frank T. Bergmann,et al.  SBML Level 3 Package: Flux Balance Constraints version 2 , 2018, J. Integr. Bioinform..

[37]  David Brindley,et al.  Decision Support Tools for Regenerative Medicine: Systematic Review , 2018, Journal of medical Internet research.

[38]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[39]  I Goryanin,et al.  Is there a Function for a Sex Pheromone Precursor? , 2019, J. Integr. Bioinform..

[40]  Herbert M. Sauro,et al.  Tellurium notebooks—An environment for reproducible dynamical modeling in systems biology , 2018, PLoS Comput. Biol..

[41]  Ann Wheeler Digital Microscopy: Nature to Numbers , 2017 .

[42]  Ulrich Sax,et al.  Provenance Solutions for Medical Research in Heterogeneous IT-Infrastructure: An Implementation Roadmap , 2019, MedInfo.

[43]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Pras Pathmanathan,et al.  Advancing Regulatory Science With Computational Modeling for Medical Devices at the FDA's Office of Science and Engineering Laboratories , 2018, Front. Med..

[45]  Matthew R. Pocock,et al.  Synthetic Biology Open Language (SBOL) Version 2.3 , 2019, J. Integr. Bioinform..

[46]  Nicolas Le Novère,et al.  COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project , 2014, BMC Bioinformatics.

[47]  Irene Schmidtmann,et al.  Guidelines and recommendations for ensuring Good Epidemiological Practice (GEP): a guideline developed by the German Society for Epidemiology , 2019, European Journal of Epidemiology.

[48]  Dagmar Waltemath,et al.  A call for virtual experiments: accelerating the scientific process. , 2015, Progress in biophysics and molecular biology.

[49]  E. Birney,et al.  Author Correction: Leveraging European infrastructures to access 1 million human genomes by 2022 , 2019, Nature Reviews Genetics.

[50]  Steven N Goodman,et al.  An ethics framework for a learning health care system: a departure from traditional research ethics and clinical ethics. , 2013, The Hastings Center report.

[51]  Iain Hrynaszkiewicz,et al.  The impact on authors and editors of introducing Data Availability Statements at Nature journals , 2018, bioRxiv.

[52]  A. Sethi,et al.  The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research. , 2017, Cancer research.

[53]  Dan Greenfield,et al.  The Importance of Data Compression in the Field of Genomics , 2019, IEEE Pulse.

[54]  Nigel H. Goddard,et al.  Towards NeuroML: model description methods for collaborative modelling in neuroscience. , 2001, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[55]  L Kuepfer,et al.  Applied Concepts in PBPK Modeling: How to Build a PBPK/PD Model , 2016, CPT: pharmacometrics & systems pharmacology.

[56]  David B. Allison,et al.  Reproducibility: A tragedy of errors , 2016, Nature.