linkedISA: semantic representation of ISA-Tab experimental metadata

BackgroundReporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax.ResultsWe present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication.ConclusionsOur linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.

[1]  Jie Zheng,et al.  Modeling a microbial community and biodiversity assay with OBO Foundry ontologies: the interoperability gains of a modular approach , 2015, Database J. Biol. Databases Curation.

[2]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[3]  Susanna-Assunta Sansone,et al.  Bio-GraphIIn: a graph-based, integrative and semantically-enabled repository for life science experimental data , 2013 .

[4]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[5]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[6]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[7]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[8]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[9]  Aedín C. Culhane,et al.  Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments , 2011, Nucleic Acids Res..

[10]  Núria Queralt-Rosinach,et al.  The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery , 2014, J. Biomed. Semant..

[11]  Oliver Hofmann,et al.  The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons , 2011, Nucleic Acids Res..

[12]  Bladimir Díaz Borges Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities , 2008 .

[13]  Christoph Steinbeck,et al.  The MetaboLights repository: curation challenges in metabolomics , 2013, Database J. Biol. Databases Curation.

[14]  Franz Baader Description Logics , 2009, Reasoning Web.

[15]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[16]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[17]  Patricia L. Whetzel,et al.  OntoMaton: a Bioportal powered ontology widget for Google Spreadsheets , 2012, Bioinform..

[18]  Richard Van Noorden Data-sharing: Everything on display , 2013, Nature.

[19]  Emilio Benfenati,et al.  The ToxBank Data Warehouse: Supporting the Replacement of In Vivo Repeated Dose Systemic Toxicity Testing , 2013, Molecular informatics.

[20]  Miguel Pignatelli,et al.  Database: The Journal of Biological Databases and Curation , 2016 .

[21]  Deborah L. McGuinness,et al.  Explorations into the Provenance of High Throughput Biomedical Experiments , 2010, IPAW.

[22]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[23]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[24]  Xiaoshu Wang,et al.  From XML to RDF: how semantic web technologies will change the design of 'omic' standards , 2005, Nature Biotechnology.

[25]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[26]  Steffen Neumann,et al.  The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again , 2014, BMC Bioinformatics.

[27]  Susanna-Assunta Sansone,et al.  Modeling a Microbial Community and Biodiversity Assay with OBI and PCO OBO Foundry Ontologies: the Interoperability Gains of a Modular Approach , 2014, ICBO.

[28]  Nigel W. Hardy,et al.  Meeting Report from the Second “Minimum Information for Biological and Biomedical Investigations” (MIBBI) workshop , 2010, Standards in genomic sciences.

[29]  M. Scott Marshall,et al.  Translating standards into practice - One Semantic Web API for Gene Expression , 2012, J. Biomed. Informatics.

[30]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[31]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[32]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[33]  Michel Dumontier,et al.  Ontology-Based Querying with Bio2RDF’s Linked Open Data , 2013, Journal of Biomedical Semantics.

[34]  M. S. Avila-Garcia,et al.  From peer-reviewed to peer-reproduced: a role for data standards, models and computational workflows in scholarly publishing , 2014, bioRxiv.

[35]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[36]  Luis Martín,et al.  RDFBuilder: A tool to automatically build RDF-based interfaces for MAGE-OM microarray data sources , 2013, Comput. Methods Programs Biomed..