Identification and Characterization of Lineage-Specific Genes within the Poaceae1[W][OA]

Using the rice (Oryza sativa) sp. japonica genome annotation, along with genomic sequence and clustered transcript assemblies from 184 species in the plant kingdom, we have identified a set of 861 rice genes that are evolutionarily conserved among six diverse species within the Poaceae yet lack significant sequence similarity with plant species outside the Poaceae. This set of evolutionarily conserved and lineage-specific rice genes is termed conserved Poaceae-specific genes (CPSGs) to reflect the presence of significant sequence similarity across three separate Poaceae subfamilies. The vast majority of rice CPSGs (86.6%) encode proteins with no putative function or functionally characterized protein domain. For the remaining CPSGs, 8.8% encode an F-box domain-containing protein and 4.5% encode a protein with a putative function. On average, the CPSGs have fewer exons, shorter total gene length, and elevated GC content when compared with genes annotated as either transposable elements (TEs) or those genes having significant sequence similarity in a species outside the Poaceae. Multiple sequence alignments of the CPSGs with sequences from other Poaceae species show conservation across a putative domain, a novel domain, or the entire coding length of the protein. At the genome level, syntenic alignments between sorghum (Sorghum bicolor) and 103 of the 861 rice CPSGs (12.0%) could be made, demonstrating an additional level of conservation for this set of genes within the Poaceae. The extensive sequence similarity in evolutionarily distinct species within the Poaceae family and an additional screen for TE-related structural characteristics and sequence discounts these CPSGs as being misannotated TEs. Collectively, these data confirm that we have identified a specific set of genes that are highly conserved within, as well as specific to, the Poaceae.

[1]  Wei Zhu,et al.  Improvement of whole-genome annotation of cereals through comparative analyses. , 2007, Genome research.

[2]  Shigeru Iida,et al.  Spontaneous mutations caused by a Helitron transposon, Hel-It1, in morning glory, Ipomoea tricolor. , 2007, The Plant journal : for cell and molecular biology.

[3]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[4]  Wei Zhu,et al.  The TIGR Plant Transcript Assemblies database , 2006, Nucleic Acids Res..

[5]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[6]  James H. Thomas Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants. , 2006, Genome research.

[7]  C. Town Annotating the genome of Medicago truncatula. , 2006, Current opinion in plant biology.

[8]  Li Zheng,et al.  The TIGR Maize Database , 2005, Nucleic Acids Res..

[9]  Michael Freeling,et al.  Horizontal Transfer of a Plant Transposon , 2005, PLoS biology.

[10]  Jia Liu,et al.  Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts , 2005, BMC Genomics.

[11]  Douglas R Hoen,et al.  The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. , 2005, Genome research.

[12]  Rod A Wing,et al.  Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. , 2005, Genome research.

[13]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[14]  李佩芳 International Rice Genome Sequencing Project. 2005. The map-based sequence of the rice genome. , 2005 .

[15]  M. Morgante,et al.  Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize , 2005, Nature Genetics.

[16]  Joachim Messing,et al.  Gene movement by Helitron transposons contributes to the haplotype variability of maize. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Xiaohong Wang,et al.  Databases and Information Integration for the Medicago truncatula Genome and Transcriptome1 , 2005, Plant Physiology.

[18]  Wei Zhu,et al.  The Institute for Genomic Research Osa1 Rice Genome Annotation Database1 , 2005, Plant Physiology.

[19]  Jean L. Chang,et al.  An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Dawei Li,et al.  The Genomes of Oryza sativa: A History of Duplications , 2005, PLoS biology.

[21]  Klaas Vandepoele,et al.  Exploring the Plant Transcriptome through Phylogenetic Profiling1[w] , 2005, Plant Physiology.

[22]  W. Richard McCombie,et al.  Sorghum Genome Sequencing by Methylation Filtration , 2005, PLoS biology.

[23]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[24]  R. Wilson,et al.  Investigating hookworm genomes by comparative analysis of two Ancylostoma species , 2005, BMC Genomics.

[25]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[26]  Jianxin Ma,et al.  Consistent over-estimation of gene number in complex plant genomes. , 2004, Current opinion in plant biology.

[27]  Lior Pachter,et al.  Intraspecies sequence comparisons for annotating genomes. , 2004, Genome research.

[28]  B. Roe,et al.  Estimating genome conservation between crop and model legume species. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  J. Bennetzen,et al.  Gene loss and movement in the maize genome. , 2004, Genome research.

[30]  Elizabeth A Kellogg,et al.  The evolution of nuclear genome structure in seed plants. , 2004, American journal of botany.

[31]  Sean R. Eddy,et al.  Pack-MULE transposable elements mediate gene evolution in plants , 2004, Nature.

[32]  Timothy Cardozo,et al.  The SCF ubiquitin ligase: insights into a molecular machine , 2004, Nature Reviews Molecular Cell Biology.

[33]  Kathryn A. VandenBosch,et al.  Computational Identification and Characterization of Novel Genes from Legumes1[w] , 2004, Plant Physiology.

[34]  G. Bernardi,et al.  The new genes of rice: a closer look. , 2004, Trends in plant science.

[35]  M. Nóbrega,et al.  Comparative genomics at the vertebrate extremes , 2004, Nature Reviews Genetics.

[36]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[37]  R. Sage The evolution of C 4 photosynthesis , 2003 .

[38]  C. Robin Buell,et al.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants , 2004, Nucleic Acids Res..

[39]  J Quackenbush,et al.  Enrichment of Gene-Coding Sequences in Maize by Genome Filtration , 2003, Science.

[40]  S. Dike,et al.  Maize Genome Sequencing by Methylation Filtration , 2003, Science.

[41]  D. Tautz,et al.  An evolutionary analysis of orphan genes in Drosophila. , 2003, Genome research.

[42]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[43]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[44]  C. Fraser,et al.  Phylogenomics: Intersection of Evolution and Genomics , 2003, Science.

[45]  Cari Soderlund,et al.  In-Depth View of Structure, Activity, and Evolution of Rice Chromosome 10 , 2003, Science.

[46]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[47]  Yinan Yuan,et al.  High-Cot sequence analysis of the maize genome. , 2003, The Plant journal : for cell and molecular biology.

[48]  Heiko Schoof,et al.  Comparison of rice and Arabidopsis annotation. , 2003, Current opinion in plant biology.

[49]  Kamel Jabbari,et al.  Compositional Features of Eukaryotic Genomes for Checking Predicted Genes , 2003, Briefings Bioinform..

[50]  Sean R. Eddy,et al.  An active DNA transposon family in rice , 2003, Nature.

[51]  Yujun Zhang,et al.  Sequence and analysis of rice chromosome 4 , 2002, Nature.

[52]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[53]  K. Allen,et al.  Assaying gene content in Arabidopsis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Cédric Feschotte,et al.  Plant transposable elements: where genetics meets genomics , 2002, Nature Reviews Genetics.

[55]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[56]  E. Kellogg,et al.  Evolutionary history of the grasses. , 2001, Plant physiology.

[57]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[58]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[59]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[60]  G. Bernardi,et al.  Two classes of genes in plants. , 2000, Genetics.

[61]  E. Kellogg The Grasses: A Case Study in Macroevolution , 2000 .

[62]  Jerrold I. Davis,et al.  A phylogeny of the grass family (Poaceae), as inferred from eight character sets. , 2000 .

[63]  L. Jacobs,et al.  The Origin of Grass-Dominated Ecosystems , 1999 .

[64]  J. Bennetzen,et al.  Plant retrotransposons. , 1999, Annual review of genetics.

[65]  G. Bernardi,et al.  Compositional Properties of Homologous Coding Sequences from Plants , 1998, Journal of Molecular Evolution.

[66]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.