The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation

MotivationBiomedical ontologists to date have concentrated on ontological descriptions of biomedical entities such as gene products and their attributes, phenotypes and so on. Recently, effort has diversified to descriptions of the laboratory investigations by which these entities were produced. However, much biological insight is gained from the analysis of the data produced from these investigations, and there is a lack of adequate descriptions of the wide range of software that are central to bioinformatics. We need to describe how data are analyzed for discovery, audit trails, provenance and reproducibility.ResultsThe Software Ontology (SWO) is a description of software used to store, manage and analyze data. Input to the SWO has come from beyond the life sciences, but its main focus is the life sciences. We used agile techniques to gather input for the SWO and keep engagement with our users. The result is an ontology that meets the needs of a broad range of users by describing software, its information processing tasks, data inputs and outputs, data formats versions and so on. Recently, the SWO has incorporated EDAM, a vocabulary for describing data and related concepts in bioinformatics. The SWO is currently being used to describe software used in multiple biomedical applications.ConclusionThe SWO is another element of the biomedical ontology landscape that is necessary for the description of biomedical entities and how they were discovered. An ontology of software used to analyze data produced by investigations in the life sciences can be made in such a way that it covers the important features requested and prioritized by its users. The SWO thus fits into the landscape of biomedical ontologies and is produced using techniques designed to keep it in line with user’s needs.AvailabilityThe Software Ontology is available under an Apache 2.0 license at http://theswo.sourceforge.net/; the Software Ontology blog can be read at http://softwareontology.wordpress.com.

[1]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[2]  Robert Stevens,et al.  Putting OWL in Order: Patterns for Sequences in OWL , 2006, OWLED.

[3]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[4]  Carole A. Goble,et al.  The myGrid ontology: bioinformatics service discovery , 2007, Int. J. Bioinform. Res. Appl..

[5]  Alan L. Rector,et al.  Modularisation of domain ontologies implemented in description logics and related formalisms including OWL , 2003, K-CAP '03.

[6]  Jos de Bruijn,et al.  Web Service Modeling Ontology , 2005, Appl. Ontology.

[7]  Carole A. Goble,et al.  Automatic annotation of Web services based on workflow definitions , 2006, TWEB.

[8]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[9]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[10]  M. Hilario,et al.  A Data Mining Ontology for Algorithm Selection and Meta-Mining , 2009 .

[11]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[12]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[13]  Ian Horrocks,et al.  From SHIQ and RDF to OWL: the making of a Web Ontology Language , 2003, J. Web Semant..

[14]  Deborah L. McGuinness,et al.  Bringing Semantics to Web Services with OWL-S , 2007, World Wide Web.

[15]  Mike Cohn,et al.  User Stories Applied: For Agile Software Development , 2004 .

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[18]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[19]  Robert Stevens,et al.  Populous: a tool for building OWL ontologies from templates , 2012, BMC Bioinformatics.

[20]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[21]  Mike Cohn,et al.  Succeeding with Agile: Software Development Using Scrum , 2009 .

[22]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[23]  John P. A. Ioannidis,et al.  Strengthening the reporting of genetic risk prediction studies (GRIPS): explanation and elaboration , 2011, European Journal of Human Genetics.

[24]  Kevin A. Smith,et al.  The Biomedical Resource Ontology (BRO) to enable resource discovery in clinical and translational research , 2011, J. Biomed. Informatics.

[25]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[26]  Mark S. Fox,et al.  The Role of Competency Questions in Enterprise Engineering , 1995 .

[27]  Robert Stevens,et al.  The SWO Project: A Case Study for Applying Agile Ontology Engineering Methods for Community Driven Ontologies , 2012, ICBO.

[28]  Gerry Kirk Democracy Unleashed: Bringing Agility to Citizen Engagement , 2011, 2011 AGILE Conference.

[29]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[30]  David Robinson,et al.  Research resources: curating the new eagle-i discovery system , 2012, Database J. Biol. Databases Curation.

[31]  Lasse Koskela,et al.  Test Driven: Practical TDD and Acceptance TDD for Java Developers , 2007 .

[32]  Karen Eilbeck,et al.  A standard variation file format for human genome sequences , 2010, Genome Biology.

[33]  Robert Stevens,et al.  An Ontology of Bioinformatics Software , 2010 .

[34]  Ubbo Visser,et al.  BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results , 2011, BMC Bioinformatics.

[35]  Carole A. Goble,et al.  BioCatalogue: a universal catalogue of web services for the life sciences , 2010, Nucleic Acids Res..

[36]  Victoria Stodden,et al.  The Scientific Method in Practice: Reproducibility in the Computational Sciences , 2010 .

[37]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[38]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[39]  Robert Stevens,et al.  bioNerDS: exploring bioinformatics’ database and software use through literature mining , 2013, BMC Bioinformatics.

[40]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[41]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..