Implementing FAIR data management within the German Network for Bioinformatics Infrastructure (de.NBI) exemplified by selected use cases

Abstract This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.

[1]  Erik Schultes,et al.  A design framework and exemplar metrics for FAIRness , 2017 .

[2]  F. Glöckner,et al.  The de.NBI / ELIXIR-DE training platform - Bioinformatics training in Germany and across Europe within ELIXIR , 2019, F1000Research.

[3]  P. Campbell,et al.  Genome Sequencing during a Patient's Journey through Cancer. , 2019, The New England journal of medicine.

[4]  Kim Ekroos,et al.  Reporting of lipidomics data should be standardized. , 2017, Biochimica et biophysica acta. Molecular and cell biology of lipids.

[5]  A. Junker,et al.  Mutation of the ALBOSTRIANS Ohnologous Gene HvCMF3 Impairs Chloroplast Development and Thylakoid Architecture in Barley due to Reduced Plastid Translation , 2019, bioRxiv.

[6]  Daniel S. Katz,et al.  Software Citation in Theory and Practice , 2018, ICMS.

[7]  Daniel J Cooper,et al.  FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. , 2019, Cell systems.

[8]  Barbara Sitek,et al.  BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication. , 2017, Journal of biotechnology.

[9]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[10]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[11]  Uwe Scholz,et al.  e!DAL - a framework to store, share and publish research data , 2014, BMC Bioinformatics.

[12]  Uwe Scholz,et al.  Enabling reusability of plant phenomic datasets with MIAPPE 1.1 , 2020, The New phytologist.

[13]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[14]  Carole A. Goble,et al.  SEEK: a systems biology data and model management platform , 2015, BMC Systems Biology.

[15]  Pascal Borry,et al.  Rules for processing genetic data for research purposes in view of the new EU General Data Protection Regulation , 2018, European Journal of Human Genetics.

[16]  Bernd Rinn,et al.  FAIRDOMHub: a repository and collaboration environment for sharing systems biology research , 2016, Nucleic Acids Res..

[17]  Erik Schultes,et al.  Evaluating FAIR maturity through a scalable, automated, community-governed framework , 2019, Scientific Data.

[18]  Council , 1954, The Aeronautical Journal (1968).

[19]  Matthias Rarey,et al.  ProteinsPlus: a web portal for structure analysis of macromolecules , 2017, Nucleic Acids Res..

[20]  Martin Eisenacher,et al.  A standardized framing for reporting protein identifications in mzIdentML 1.2 , 2014, Proteomics.

[21]  Ewan Birney,et al.  Genomics in healthcare: GA4GH looks to 2022 , 2017, bioRxiv.

[22]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2015, Nucleic Acids Res..

[23]  Michel Dumontier,et al.  A design framework and exemplar metrics for FAIRness , 2017, Scientific Data.

[24]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[25]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[26]  Christian Ebeling,et al.  BacDive in 2019: bacterial phenotypic data for High-throughput biodiversity analysis , 2018, Nucleic Acids Res..

[27]  Martin Eisenacher,et al.  The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary , 2013, Database J. Biol. Databases Curation.

[28]  Martin Eisenacher,et al.  PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface. , 2015, Journal of proteome research.

[29]  Harald Barsnes,et al.  The mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics* , 2017, Molecular & Cellular Proteomics.

[30]  Martin Eisenacher,et al.  The amino acid's backup bone - storage solutions for proteomics facilities. , 2014, Biochimica et biophysica acta.

[31]  Massimiliano Izzo,et al.  FAIRsharing as a community approach to standards, repositories and policies , 2019, Nature Biotechnology.

[32]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[33]  Astrid Junker,et al.  Optimizing experimental procedures for quantitative evaluation of crop plant performance in high throughput phenotyping systems , 2015, Front. Plant Sci..

[34]  Alfonso Valencia,et al.  Towards FAIR principles for research software , 2020, Data Sci..

[35]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[36]  Thomas Nussbaumer,et al.  PGSB PlantsDB: updates to the database framework for comparative plant genome research , 2015, Nucleic Acids Res..

[37]  Uwe Scholz,et al.  PGP repository: a plant phenomics and genomics data publication infrastructure , 2016, Database J. Biol. Databases Curation.

[38]  Anna-Lena Lamprecht,et al.  Automated workflow composition in mass spectrometry-based proteomics , 2018, Bioinform..

[39]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[40]  Thomas M. Keane,et al.  The European Nucleotide Archive in 2018 , 2018, Nucleic Acids Res..

[41]  F. Glöckner,et al.  The de.NBI / ELIXIR-DE training platform - Bioinformatics training in Germany and across Europe within ELIXIR. , 2019, F1000Research.

[42]  Jeffrey Braithwaite,et al.  Integrating Genomics into Healthcare: A Global Responsibility. , 2019, American journal of human genetics.

[43]  Jun Fan,et al.  The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience* , 2014, Molecular & Cellular Proteomics.

[44]  Fabian Prasser,et al.  Enhancing Reuse of Data and Biological Material in Medical Research: From FAIR to FAIR-Health , 2018, Biopreservation and biobanking.

[45]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[46]  Uwe Scholz,et al.  Measures for interoperability of phenotypic data: minimum information requirements and formatting , 2016, Plant Methods.

[47]  Juan Antonio Vizcaíno,et al.  How to submit MS proteomics data to ProteomeXchange via the PRIDE database , 2014, Proteomics.

[48]  M. Diepenbroek,et al.  PANGAEA: an information system for environmental sciences , 2002 .

[49]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2012, Nucleic Acids Res..

[50]  Hollie White,et al.  The Dryad Data Repository , 2008 .

[51]  Robert Stevens,et al.  The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation , 2014, J. Biomed. Semant..

[52]  Andreas Tauch,et al.  Bioinformatics in Germany: toward a national-level infrastructure , 2017, Briefings Bioinform..

[53]  Xosé M. Fernández,et al.  The 27th annual Nucleic Acids Research database issue and molecular biology database collection , 2019, Nucleic Acids Res..

[54]  Martin Eisenacher,et al.  Protein Inference Using PIA Workflows and PSI Standard File Formats. , 2018, Journal of proteome research.

[55]  John Wilbanks,et al.  Responsible sharing of biomedical data and biospecimens via the “Automatable Discovery and Access Matrix” (ADA-M) , 2018, npj Genomic Medicine.

[56]  Alfred O. Hero,et al.  The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis , 2016, J. Biomed. Semant..

[57]  Jacky L. Snoep,et al.  BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems , 2005, Nucleic Acids Res..

[58]  Antje Chang,et al.  BRENDA in 2019: a European ELIXIR core data resource , 2018, Nucleic Acids Res..

[59]  Martin Eisenacher,et al.  The PRIDE database and related tools and resources in 2019: improving support for quantification data , 2018, Nucleic Acids Res..

[60]  Hedi Peterson,et al.  The bio.tools registry of software tools and data resources for the life sciences , 2019, Genome Biology.

[61]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[62]  Erik Schultes,et al.  Evaluating FAIR maturity through a scalable, automated, community-governed framework , 2019, Scientific Data.

[63]  Uwe Scholz,et al.  BrAPI—an application programming interface for plant breeding applications , 2019, Bioinform..

[64]  Eric W Deutsch,et al.  Spectra, chromatograms, Metadata: mzML-the standard data format for mass spectrometer output. , 2011, Methods in molecular biology.

[65]  Carole A. Goble,et al.  RightField: embedding ontology annotation in spreadsheets , 2011, Bioinform..