Stealthy annotation of experimental biology by spreadsheets

The increase in volume and complexity of biological data has led to increased requirements to reuse that data. Consistent and accurate metadata is essential for this task, creating new challenges in semantic data annotation and in the constriction of terminologies and ontologies used for annotation. The BioSharing community are developing standards and terminologies for annotation, which have been adopted across bioinformatics, but the real challenge is to make these standards accessible to laboratory scientists. Widespread adoption requires the provision of tools to assist scientists whilst reducing the complexities of working with semantics. This paper describes unobtrusive ‘stealthy’ methods for collecting standards compliant, semantically annotated data and for contributing to ontologies used for those annotations. Spreadsheets are ubiquitous in laboratory data management. Our spreadsheet‐based RightField tool enables scientists to structure information and select ontology terms for annotation within spreadsheets, producing high quality, consistent data without changing common working practices. Furthermore, our Populous spreadsheet tool proves effective for gathering domain knowledge in the form of Web Ontology Language (OWL) ontologies. Such a corpus of structured and semantically enriched knowledge can be extracted in Resource Description Framework (RDF), providing further means for searching across the content and contributing to Open Linked Data (http://linkeddata.org/). Copyright © 2012 John Wiley & Sons, Ltd.

[1]  Alex Bateman,et al.  Curators of the world unite: the International Society of Biocuration , 2010, Bioinform..

[2]  Robert Stevens,et al.  Applying Ontology Design Patterns in Bio-ontologies , 2008, EKAW.

[3]  Bjoern Peters,et al.  Overcoming the Ontology Enrichment Bottleneck with Quick Term Templates , 2009 .

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[6]  Asunción Gómez-Pérez,et al.  Scenarios for building ontology networks within the NeOn methodology , 2009, K-CAP '09.

[7]  Ivan Janciak,et al.  UK e-Science All Hands Meeting , 2009 .

[8]  Alan L. Rector,et al.  Modularisation of domain ontologies implemented in description logics and related formalisms including OWL , 2003, K-CAP '03.

[9]  Wolfram Wöß,et al.  XLWrap - Querying and Integrating Arbitrary Spreadsheets with SPARQL , 2009, SEMWEB.

[10]  Chris F. Taylor,et al.  The MGED Ontology: a resource for semantics-based description of microarray experiments , 2006, Bioinform..

[11]  Alistair J. P. Brown,et al.  PEDRo: A database for storing, searching and disseminating experimental proteomics data , 2004, BMC Genomics.

[12]  Olivier Bodenreider,et al.  Bio-ontologies: current trends and future directions , 2006, Briefings Bioinform..

[13]  BechhoferSean,et al.  The OWL API: A Java API for OWL ontologies , 2011 .

[14]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[15]  Robert Stevens,et al.  Embedding Knowledge Patterns into OWL , 2009, ESWC.

[16]  Faisal Ibne Rezwan,et al.  MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB , 2009, Bioinform..

[17]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[18]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[19]  Peter McQuilton,et al.  Inside FlyBase: Biocuration as a career , 2009, Fly.

[20]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[21]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[22]  Carole Goble,et al.  The SEEK: a platform for sharing data and models in systems biology. , 2011, Methods in enzymology.

[23]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[24]  Lennart Martens,et al.  PRIDE: Data Submission and Analysis , 2010, Current protocols in protein science.

[25]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[26]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[27]  Timothy W. Finin,et al.  RDF123: From Spreadsheets to RDF , 2008, SEMWEB.

[28]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[29]  Asunción Gómez-Pérez,et al.  NeOn Methodology for Building Ontology Networks: a Scenario-based Methodology , 2009 .

[30]  Robert Stevens,et al.  Populous: a tool for building OWL ontologies from templates , 2012, BMC Bioinformatics.

[31]  Nigel W. Hardy,et al.  The first RSBI (ISA-TAB) workshop: "can a simple format work for complex studies?". , 2008, Omics : a journal of integrative biology.

[32]  Alan Ruttenberg,et al.  Overcoming the ontology enrichment bottleneck with Quick Term Templates , 2011, Appl. Ontology.

[33]  Lennart Martens,et al.  The Ontology Lookup Service: bigger and better , 2010, Nucleic Acids Res..

[34]  Robert Stevens,et al.  Developing a kidney and urinary pathway knowledge base , 2011, J. Biomed. Semant..

[35]  Carole A. Goble,et al.  RightField: embedding ontology annotation in spreadsheets , 2011, Bioinform..

[36]  Robert Stevens,et al.  Populous: A tool for populating ontology templates , 2010, SWAT4LS 2010.

[37]  Martin J. O'Connor,et al.  Mapping Master: A Flexible Approach for Mapping Spreadsheets to OWL , 2010, SEMWEB.

[38]  Danius T. Michaelides,et al.  Shaping Ramps for Data-Intensive Research , 2010 .

[39]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[40]  Carole A. Goble,et al.  RightField: Semantic enrichment of Systems Biology data using spreadsheets , 2012, 2012 IEEE 8th International Conference on E-Science.