ISA API: An open platform for interoperable life science experimental metadata

Background The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open-source community specifications and software tools for enabling discovery, exchange and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab – a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, a JSON serialization ISA-JSON was developed. Results In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python class objects. We describe the ISA API feature set, early adopters and its growing user community. Conclusions The ISA API provides users with rich programmatic metadata handling functionality to support automation, a common interface and an interoperable medium between the two ISA formats, as well as with other life science data formats required for depositing data in public databases.

[1]  Christoph Steinbeck,et al.  nmrML: A Community Supported Open Data Standard for the Description, Storage, and Exchange of NMR Data. , 2018, Analytical chemistry.

[2]  Massimiliano Izzo,et al.  ISAcreate Galaxy tool for prospective data management with ISA format support – application to metabolomics datasets , 2018 .

[3]  Guy Cochrane,et al.  The European Nucleotide Archive in 2019 , 2019, Nucleic Acids Res..

[4]  Dan Brickley,et al.  Schema.org , 2016, Commun. ACM.

[5]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[6]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[7]  A. Gawer Platforms, Markets and Innovation , 2011 .

[8]  Paul J. Kersey,et al.  COPO: a metadata platform for brokering FAIR data in the life sciences , 2019, bioRxiv.

[9]  Wei Chen,et al.  Reconstruction and analysis of a genome-scale metabolic model of the oleaginous fungus Mortierella alpina , 2015, BMC Systems Biology.

[10]  Daniel C. Berrios,et al.  GeneLab: Omics database for spaceflight experiments , 2018, Bioinform..

[11]  Shreyas Ananthan,et al.  A large-scale analysis of bioinformatics code on GitHub , 2018, bioRxiv.

[12]  Claire O'Donovan,et al.  MetaboLights: a resource evolving in response to the needs of its scientific community , 2019, Nucleic Acids Res..

[13]  Simon Jupp,et al.  A new Ontology Lookup Service at EMBL-EBI , 2015, SWAT4LS.

[14]  Ola Spjuth,et al.  Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud , 2018, bioRxiv.

[15]  Daniel Jacob,et al.  Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics , 2014, Bioinform..

[16]  Steffen Neumann,et al.  The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again , 2014, BMC Bioinformatics.

[17]  Massimiliano Izzo,et al.  Helping the Consumers and Producers of Standards, Repositories and Policies to Enable FAIR Data , 2020, Data Intelligence.

[18]  Thomas R. Eisenmann,et al.  Opening Platforms: How, When and Why? , 2008 .

[19]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[20]  Alejandra N. González-Beltrán,et al.  PhenoMeNal: processing and analysis of metabolomics data in the cloud , 2018, bioRxiv.

[21]  David Johnson,et al.  COPO - Linked Open Infrastructure for Plant Data , 2015, SWAT4LS.

[22]  Emilio Benfenati,et al.  The ToxBank Data Warehouse: Supporting the Replacement of In Vivo Repeated Dose Systemic Toxicity Testing , 2013, Molecular informatics.

[23]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[24]  B. Mons,et al.  Nano-Publication in the e-science era , 2009 .

[25]  Susanna-Assunta Sansone,et al.  Bio-GraphIIn: a graph-based, integrative and semantically-enabled repository for life science experimental data , 2013 .

[26]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[27]  Susanna-Assunta Sansone,et al.  linkedISA: semantic representation of ISA-Tab experimental metadata , 2014, BMC Bioinformatics.

[28]  Neil D. Rawlings,et al.  New mini- zincin structures provide a minimal scaffold for members of this metallopeptidase superfamily , 2014, BMC Bioinformatics.

[29]  Martín Ugarte,et al.  Foundations of JSON Schema , 2016, WWW.

[30]  Uwe Scholz,et al.  BrAPI—an application programming interface for plant breeding applications , 2019, Bioinform..

[31]  Carole A. Goble,et al.  SEEK: a systems biology data and model management platform , 2015, BMC Systems Biology.

[32]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[33]  Patricia L. Whetzel,et al.  OntoMaton: a Bioportal powered ontology widget for Google Spreadsheets , 2012, Bioinform..

[34]  Uwe Scholz,et al.  Measures for interoperability of phenotypic data: minimum information requirements and formatting , 2016, Plant Methods.

[35]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[36]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[37]  Oliver Hofmann,et al.  The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons , 2011, Nucleic Acids Res..

[38]  Nuno A. Fonseca,et al.  ArrayExpress update – from bulk to single-cell expression data , 2018, Nucleic Acids Res..

[39]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[40]  Jordan M. Malof,et al.  Distributed solar photovoltaic array location and extent dataset for remote sensing object identification , 2016, Scientific Data.

[41]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[42]  Jake Vanderplas,et al.  mwaskom/seaborn: v0.11.0 (Sepetmber 2020) , 2020 .

[43]  Marco Brandizi,et al.  graph2tab, a library to convert experimental workflow graphs into tabular formats , 2012, Bioinform..

[44]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[45]  Helen E. Parkinson,et al.  BioSamples database: an updated sample metadata hub , 2018, Nucleic Acids Res..