Linking Genome Annotation Projects with Genetic Disorders using Ontologies

Genome sequencing projects generate vast amounts of data of a wide variety of types and complexities, and at a growing pace. Traditionally, the annotation of such sequences was difficult to share with other researchers. Despite the fact that this has improved with the development and application of biological ontologies, such annotation efforts remain isolated since the amount of information that can be used from other annotation projects is limited. In addition to this, they do not benefit from the translational information available for the genomic sequences. In this paper, we describe a system that supports genome annotation processes by providing useful information about orthologous genes and the genetic disorders which can be associated with a gene identified in a sequence. The seamless integration of such data will be facilitated by an ontological infrastructure which, following best practices in ontology engineering, will reuse existing biological ontologies like Sequence Ontology or Ontological Gene Orthology.

[1]  I. Hariharan,et al.  The Drosophila Mst Ortholog, hippo, Restricts Growth and Cell Proliferation and Promotes Apoptosis , 2003, Cell.

[2]  Fabian Schreiber,et al.  Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information , 2011, Briefings Bioinform..

[3]  Jesualdo Tomás Fernández-Breis,et al.  Semantic integration of information about orthologs and diseases: The OGO system , 2011, J. Biomed. Informatics.

[4]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  S. Tanksley,et al.  Combining Bioinformatics and Phylogenetics to Identify Large Sets of Single-Copy Orthologous Genes (COSII) for Comparative, Evolutionary and Systematic Studies: A Test Case in the Euasterid Plant Clade , 2006, Genetics.

[7]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[8]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[9]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[10]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[11]  W. Kibbe,et al.  Annotating the human genome with Disease Ontology , 2009, BMC Genomics.

[12]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[13]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[14]  P. Whitby,et al.  The iron/heme regulated genes of Haemophilus influenzae: comparative transcriptional profiling as a tool to define the species core modulon , 2009, BMC Genomics.

[15]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[16]  Jesualdo Tomás Fernández-Breis,et al.  Technologies and Best Practices for Building Bio‐Ontologies , 2010 .

[17]  Jesualdo Tomás Fernández-Breis,et al.  OGO: an ontological approach for integrating knowledge about orthology , 2009, BMC Bioinformatics.

[18]  Jesualdo Tomás Fernández-Breis,et al.  A Semantic Query Interface for the OGO Platform , 2010, ITBAM.

[19]  Qiang Yang,et al.  MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study , 2009, BMC Bioinformatics.

[20]  Karen Eilbeck,et al.  SOBA: sequence ontology bioinformatics analysis , 2010, Nucleic Acids Res..

[21]  Javier Herrero,et al.  Toward community standards in the quest for orthologs , 2012, Bioinform..

[22]  Zhiyong Lu,et al.  Database resources of the National Center for Biotechnology Information , 2010, Nucleic Acids Res..

[23]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[24]  Robert Stevens,et al.  The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process , 2009, Genome Biology.

[25]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[26]  Richard Mott,et al.  Annotation, genetics and transcriptomics , 2008 .

[27]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[28]  John D. Osborne,et al.  Annotating the human genome with Disease , 2009 .

[29]  Jesualdo Tomás Fernández-Breis,et al.  Publishing Orthology and Diseases Information in the Linked Open Data Cloud , 2012 .

[30]  Lincoln Stein,et al.  Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.