Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes

The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.

[1]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[2]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.

[3]  David Haussler,et al.  Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing. , 2004, Genome research.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[6]  Y. Sakaki,et al.  Criteria for gene identification and features of genome organization: analysis of 6.5 Mb of DNA sequence from human chromosome 21. , 2000, Gene.

[7]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[8]  G. Rubin,et al.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[10]  International Human Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004 .

[11]  Gregor Eichele,et al.  Human chromosome 21 gene expression atlas in the mouse , 2002, Nature.

[12]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[13]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[14]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[15]  Simon Cawley,et al.  Accurate identification of novel human genes through simultaneous gene prediction in human, mouse, and rat. , 2004, Genome research.

[16]  Manimozhiyan Arumugam,et al.  Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing. , 2004, Genome research.

[17]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[18]  T. Andrews,et al.  The Ensembl automatic gene annotation system. , 2004, Genome research.

[19]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[20]  M. Brent,et al.  Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  C. V. Jongeneel,et al.  Nineteen additional unpredicted transcripts from human chromosome 21. , 2002, Genomics.

[22]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[23]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.