Gene finding in the chicken genome

BackgroundDespite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method.ResultsWe performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end.ConclusionsDe novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods.

[1]  T. Andrews,et al.  The Ensembl automatic gene annotation system. , 2004, Genome research.

[2]  Chaochun Wei,et al.  Closing in on the C. elegans ORFeome by cloning TWINSCAN predictions. , 2005, Genome research.

[3]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[4]  S. Kasif,et al.  Human-mouse gene identification by comparative evidence integration and evolutionary analysis. , 2003, Genome research.

[5]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[6]  Ewan Birney,et al.  Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags. , 2005, Genome research.

[7]  C. V. Jongeneel,et al.  Nineteen additional unpredicted transcripts from human chromosome 21. , 2002, Genomics.

[8]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[9]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[10]  C. V. Jongeneel,et al.  Numerous potentially functional but non-genic conserved sequences on human chromosome 21 , 2002, Nature.

[11]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[12]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[13]  Izabela Makalowska,et al.  Identification of six novel genes by experimental validation of GeneMachine predicted genes. , 2002, Gene.

[14]  Manimozhiyan Arumugam,et al.  Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing. , 2004, Genome research.

[15]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[16]  Jonathan E. Allen,et al.  Computational gene prediction using multiple sources of evidence. , 2003, Genome research.

[17]  Toshihisa Takagi,et al.  DIGIT: A Novel Gene Finding Program by Combining Gene-Finders , 2002, Pacific Symposium on Biocomputing.

[18]  Alan K. Mackworth,et al.  Improving gene recognition accuracy by combining predictions from two gene-finding programs , 2002, Bioinform..

[19]  Eduardo Eyras,et al.  ESTGenes: alternative splicing from ESTs in Ensembl. , 2004, Genome research.

[20]  Charles J. Vaske,et al.  Gene prediction and verification in a compact genome with numerous small introns. , 2004, Genome research.

[21]  R. Durbin,et al.  GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. , 2002, Genome research.

[22]  M. Brent,et al.  Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. , 2003, Genome research.

[23]  L. Pachter,et al.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[24]  Alexandre Reymond,et al.  Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs) , 2003, Science.

[25]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[26]  M. Brent,et al.  Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ian Korf,et al.  Integrating genomic homology into gene structure prediction , 2001, ISMB.

[28]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[29]  Paul E. Boardman,et al.  A Comprehensive Collection of Chicken cDNAs , 2002, Current Biology.

[30]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[31]  C. Burge,et al.  Assessment of the total number of human transcription units. , 2001, Genomics.

[32]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.