Computational gene annotation in new genome assemblies using GeneID.

The sequence of many eukaryotic genomes is nowadays available from a personal computer to any researcher in the world-wide scientific community. However, the sequences are worthless without the adequate annotation of the biological meaningful elements. The annotation of the genes, in particular, is a challenging task that can not be tackled without the aid of specific bioinformatics tools. We present in this chapter a simple protocol mainly based on the combination of the program GeneID and other computational tools to annotate the location of a gene, which was previously annotated in D. melanogaster, in the recently assembled genome of D. yakuba.

[1]  T. Cooper,et al.  Finding signals that regulate alternative splicing in the post-genomic era , 2002, Genome Biology.

[2]  R. Guigó,et al.  GeneID in Drosophila. , 2000, Genome research.

[3]  R. Guigó,et al.  Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution , 2004, EMBO reports.

[4]  Enrique Blanco,et al.  Using geneid to Identify Genes , 2002, Current protocols in bioinformatics.

[5]  G. Jiménez,et al.  Relief of gene repression by torso RTK signaling: role of capicua in Drosophila terminal and dorsoventral patterning. , 2000, Genes & development.

[6]  Mark Borodovsky,et al.  GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses , 2005, Nucleic Acids Res..

[7]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[8]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[9]  B. Morgenstern,et al.  AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome , 2006, Genome Biology.

[10]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Thomas R Gingeras,et al.  Origin of phenotypes: genes and transcripts. , 2007, Genome research.

[13]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[14]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[15]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.

[16]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[17]  R. Guigó,et al.  Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia , 2006, Nature.

[18]  Michael Q. Zhang Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[19]  Steven Salzberg,et al.  JIGSAW: integration of multiple sources of evidence for gene prediction , 2005, Bioinform..

[20]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[21]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[22]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[23]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[24]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[25]  Serafim Batzoglou,et al.  The many faces of sequence alignment , 2005, Briefings Bioinform..

[26]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[27]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[28]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[29]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[30]  Roderic Guigó,et al.  Assembling Genes from Predicted Exons In Linear Time with Dynamic Programming , 1998, J. Comput. Biol..

[31]  Matthias Platzer,et al.  Sequence and analysis of chromosome 2 of Dictyostelium discoideum , 2002, Nature.

[32]  M. Berry,et al.  Knowing when not to stop: selenocysteine incorporation in eukaryotes. , 1996, Trends in biochemical sciences.

[33]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[34]  M. Brent,et al.  The effects of evolutionary distance on TWINSCAN, an algorithm for pair-wise comparative gene prediction. , 2003, Cold Spring Harbor symposia on quantitative biology.

[35]  Roderic Guigó,et al.  Gff2ps: Visualizing Genomic Annotations , 2000, Bioinform..

[36]  Madeline A. Crosby,et al.  FlyBase: genomes by the dozen , 2006, Nucleic Acids Res..