The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again

BackgroundThe ISA-Tab format and software suite have been developed to break the silo effect induced by technology-specific formats for a variety of data types and to better support experimental metadata tracking. Experimentalists seldom use a single technique to monitor biological signals. Providing a multi-purpose, pragmatic and accessible format that abstracts away common constructs for describing Investigations, Studies and Assays, ISA is increasingly popular. To attract further interest towards the format and extend support to ensure reproducible research and reusable data, we present the Risa package, which delivers a central component to support the ISA format by enabling effortless integration with R, the popular, open source data crunching environment.ResultsThe Risa package bridges the gap between the metadata collection and curation in an ISA-compliant way and the data analysis using the widely used statistical computing environment R. The package offers functionality for: i) parsing ISA-Tab datasets into R objects, ii) augmenting annotation with extra metadata not explicitly stated in the ISA syntax; iii) interfacing with domain specific R packages iv) suggesting potentially useful R packages available in Bioconductor for subsequent processing of the experimental data described in the ISA format; and finally v) saving back to ISA-Tab files augmented with analysis specific metadata from R. We demonstrate these features by presenting use cases for mass spectrometry data and DNA microarray data.ConclusionsThe Risa package is open source (with LGPL license) and freely available through Bioconductor. By making Risa available, we aim to facilitate the task of processing experimental data, encouraging a uniform representation of experimental information and results while delivering tools for ensuring traceability and provenance tracking.Software availabilityThe Risa package is available since Bioconductor 2.11 (version 1.0.0) and version 1.2.1 appeared in Bioconductor 2.12, both along with documentation and examples. The latest version of the code is at the development branch in Bioconductor and can also be accessed from GitHub https://github.com/ISA-tools/Risa, where the issue tracker allows users to report bugs or feature requests.

[1]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[2]  Patricia L. Whetzel,et al.  OntoMaton: a Bioportal powered ontology widget for Google Spreadsheets , 2012, Bioinform..

[3]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[4]  R. Mead,et al.  The Design of Experiments: Statistical Principles for Practical Applications. , 1989 .

[5]  A. Saghatelian,et al.  Assignment of endogenous substrates to enzymes by global metabolite profiling. , 2004, Biochemistry.

[6]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[7]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[8]  Steffen Neumann,et al.  Nearline acquisition and processing of liquid chromatography-tandem mass spectrometry data , 2012, Metabolomics.

[9]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[10]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[11]  Wolfram Goessling,et al.  The Wnt/β-Catenin Pathway Is Required for the Development of Leukemia Stem Cells in AML , 2010, Science.

[12]  Chris F. Taylor,et al.  The MGED Ontology: a resource for semantics-based description of microarray experiments , 2006, Bioinform..

[13]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[14]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[15]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[16]  Susanna-Assunta Sansone,et al.  The open source ISA software suite and its international user community: knowledge management of experimental data , 2012 .

[17]  Oliver Hofmann,et al.  The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons , 2011, Nucleic Acids Res..

[18]  Maria De Iorio,et al.  BATMAN - an R package for the automated quantification of metabolites from nuclear magnetic resonance spectra using a Bayesian model , 2012, Bioinform..

[19]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[20]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[21]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[22]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[23]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[24]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[25]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[26]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[27]  P. Breeze +620. Price £65 R. Mead , 1989, Animal Behaviour.

[28]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[29]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.