Consistent over-estimation of gene number in complex plant genomes.

The first comprehensive comparison of gene content between higher plant species provided the unexpected conclusions that rice contained about twice as many genes as Arabidopsis, and that about half of the rice genes had no obvious homologs in any other organism. Our subsequent analyses indicate that most of these "extra, novel" rice genes are mis-annotated segments of transposable elements, especially retrotransposons. Aggressive annotation of a randomly selected subset of the rice genome suggests that the gene number is less than 40000. The five fantasies of automated plant gene discovery are described and a protocol is provided to minimize (or at least predict) the inaccuracy of future plant genome annotations.

[1]  R. Wing,et al.  The Rice Chromosome 10 Sequencing Consortium. In-Depth View of Structure, Activity, and Evolution of Rice Chromosome 10 , 2002 .

[2]  Terry Gaasterland,et al.  Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. , 2002, Genome research.

[3]  J Quackenbush,et al.  Enrichment of Gene-Coding Sequences in Maize by Genome Filtration , 2003, Science.

[4]  Jianxin Ma,et al.  Rapid recent growth and divergence of rice nuclear genomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Bennetzen Opening the Door to Comparative Plant Biology , 2002, Science.

[6]  J. Bennetzen,et al.  A complex history of rearrangement in an orthologous region of the maize, sorghum, and rice genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[8]  Volker Brendel,et al.  The Maize Genome Contains a Helitron Insertion Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.008375. , 2003, The Plant Cell Online.

[9]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[10]  K. Allen,et al.  Assaying gene content in Arabidopsis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Brandon S Gaut,et al.  Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. , 2002, Molecular biology and evolution.

[12]  T. Gojobori,et al.  The genome sequence and structure of rice chromosome 1 , 2002, Nature.

[13]  J. Bennetzen,et al.  The contributions of retroelements to plant genome organization, function and evolution. , 1996, Trends in microbiology.

[14]  James K. M. Brown,et al.  Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. , 2002, Genome research.

[15]  J. Bennetzen Comparative Sequence Analysis of Plant Nuclear Genomes: Microcolinearity and Its Many Exceptions , 2000, Plant Cell.

[16]  Christopher J. Lee,et al.  Genome-wide detection of alternative splicing in expressed sequences of human genes , 2001, Nucleic Acids Res..

[17]  Jianxin Ma,et al.  Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. , 2004, Genome research.

[18]  Phillip SanMiguel,et al.  Evidence that a Recent Increase in Maize Genome Size was Caused by the Massive Amplification of Intergene Retrotransposons , 1998 .

[19]  Yujun Zhang,et al.  Sequence and analysis of rice chromosome 4 , 2002, Nature.

[20]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[21]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[22]  M. Grandbastien Activation of plant retrotransposons under stress conditions , 1998 .

[23]  G. Bernardi,et al.  The new genes of rice: a closer look. , 2004, Trends in plant science.

[24]  E. Pennisi A Low Number Wins the GeneSweep Pool , 2003, Science.