The Genomes of Oryza sativa: A History of Duplications

We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.

Dawei Li | Huanming Yang | Jun Wang | Wei-Mou Zheng | Songgang Li | R. Samudrala | Feng Zhang | Ruiqiang Li | Jun Li | Heng Li | Yan Zhou | Peixiang Ni | Dongyuan Liu | Jianguo Zhang | Jia Ye | L. Fang | Hongkun Zheng | G. Wong | B. Liu | Changqing Zeng | Wei Lin | Jun Zhou | Weiqi Wang | Songnian Hu | Jun Yu | Yong Zhang | Jing Wang | Ximiao He | C. Ye | D. Bu | Jingfen Zhang | Yajun Deng | Mengliang Cao | W. Tong | Lijuan Cong | J. Geng | Yujun Han | Xiangang Huang | Wenjie Li | Q. Qi | Jinsong Liu | Wei Dong | Xiaoyu Ren | Xianran Li | Zhao Xu | Wenming Zhao | Bailin Hao | Longping Yuan | Shengting Li | Lei Gao | Xiaoling Wang | N. Li | Zuyuan Xu | Liang Lin | Jianning Yin | Guangyuan Li | Jianping Shi | Juan Liu | Hong Lv | L. Ran | Xiaoli Shi | Xiyin Wang | Qingfa Wu | Changfeng Li | Jingqiang Wang | Xiaowei Zhang | Zhendong Ji | Yongqiao Sun | Zhen-peng Zhang | J. Bao | Lingli Dong | Jia Ji | Peng Chen | Shuming Wu | Ying Xiao | J. Tan | Li Yang | Jingyi. Xu | Yingpu Yu | Bingop Zhang | Shulin Zhuang | Haibin Wei | M. Lei | Hong Yu | Yuan-Zhe Li | Hao Xu | Shulin Wei | Li-li Fang | Zengjin Zhang | Yunze Zhang | Zhixi Su | Jinhong Li | Z. Tong | Shuangli Li | Lishun Wang | T. Lei | Chen Chen | Huan Chen | Haihong Li | Haiyan Huang | Huayong Xu | Caifeng Zhao | Shuting Li | L. Dong | Yanqing Huang | Long Li | Yan Xi | Bo-Jun Zhang | Wei Hu | Yanling Zhang | X. Tian | Yongzhi Jiao | Xi-Hai Liang | Jiao Jin | Siqi Liu | J. Mcdermott | Jian Wang | Wen Wang | Qiuhui Qi | Bo Zhang | Meng Lei | Jingyi Xu | Tingting Lei | Jianing Geng

[1]  Huanming Yang,et al.  The Genomes of Oryza sativa , 2005 .

[2]  S. Arai,et al.  Identification of the duplicated segments in rice chromosomes 1 and 5 by linkage analysis of cDNA markers of known functions , 1994, Theoretical and Applied Genetics.

[3]  Jianxin Ma,et al.  Consistent over-estimation of gene number in complex plant genomes. , 2004, Current opinion in plant biology.

[4]  Jian Jin,et al.  Development of Genome-Wide DNA Polymorphism Database for Map-Based Cloning of Rice Genes1[w] , 2004, Plant Physiology.

[5]  Guillaume Blanc,et al.  Functional Divergence of Duplicated Genes Formed by Polyploidy during Arabidopsis Evolution , 2004, The Plant Cell Online.

[6]  Guillaume Blanc,et al.  Widespread Paleopolyploidy in Model Plant Species Inferred from Age Distributions of Duplicate Genes , 2004, The Plant Cell Online.

[7]  A. Paterson,et al.  Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[9]  J. Salse,et al.  New in silico insight into the synteny between rice (Oryza sativa L.) and maize (Zea mays L.) highlights reshuffling and identifies new duplications in the rice genome. , 2004, The Plant journal : for cell and molecular biology.

[10]  Jianxin Ma,et al.  Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. , 2004, Genome research.

[11]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[12]  M. Delseny Re-evaluating the relevance of ancestral shared synteny as a tool for crop improvement. , 2004, Current opinion in plant biology.

[13]  Ram Samudrala,et al.  Enhanced functional information from predicted protein networks. , 2004, Trends in biotechnology.

[14]  Steven G. Schroeder,et al.  Development and mapping of SSR markers for maize , 2002, Plant Molecular Biology.

[15]  Brandon S. Gaut,et al.  Evolution of genes and taxa: a primer , 2004, Plant Molecular Biology.

[16]  Jian Wang,et al.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics , 2004, Nucleic Acids Res..

[17]  Jonathan F. Wendel,et al.  Genome evolution in polyploids , 2004, Plant Molecular Biology.

[18]  D. Tautz,et al.  An evolutionary analysis of orphan genes in Drosophila. , 2003, Genome research.

[19]  Klaas Vandepoele,et al.  Evidence That Rice and Other Cereals Are Ancient Aneuploids Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.014019. , 2003, The Plant Cell Online.

[20]  Jia Ye,et al.  Vertebrate gene predictions and the problem of large genes , 2003, Nature Reviews Genetics.

[21]  J. Kawai,et al.  Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from japonica Rice , 2003, Science.

[22]  Ram Samudrala,et al.  Bioverse: functional, structural and contextual annotation of proteins and proteomes , 2003, Nucleic Acids Res..

[23]  Cari Soderlund,et al.  In-Depth View of Structure, Activity, and Evolution of Rice Chromosome 10 , 2003, Science.

[24]  E. Kellogg What happens to genes in duplicated genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. Bennetzen,et al.  The genetic colinearity of rice and other cereals on the basis of genomic sequence analysis. , 2003, Current opinion in plant biology.

[26]  W. McCombie,et al.  Gene enrichment in plant genomic shotgun libraries. , 2003, Current opinion in plant biology.

[27]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[28]  Takuji Sasaki Genome studies and molecular genetics: The rice genome and comparative genomics of higher plants , 2003 .

[29]  M. Delseny Towards an accurate sequence of the rice genome. , 2003, Current opinion in plant biology.

[30]  Abdelali Barakat,et al.  Plant genome archaeology: evidence for conserved ancestral chromosome segments in dicotyledonous plant species. , 2003, Plant biotechnology journal.

[31]  Vincent Colot,et al.  Understanding mechanisms of novel gene expression in polyploids. , 2003, Trends in genetics : TIG.

[32]  Jun Wang,et al.  A Statistical Approach Designed for Finding Mathematically Defined Repeats in Shotgun Data and Determining the Length Distribution of Clone-Inserts , 2003, Genomics, proteomics & bioinformatics.

[33]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.

[34]  Zukang Feng,et al.  The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..

[35]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[36]  C. R. Buell Obtaining the sequence of the rice genome and lessons learned along the way. , 2002, Trends in plant science.

[37]  C. Buell,et al.  Current Status of the Sequence of the Rice Genome and Prospects for Finishing the First Monocot Genome1 , 2002, Plant Physiology.

[38]  L. Stein,et al.  Gramene, a Tool for Grass Genomics , 2002, Plant Physiology.

[39]  M. Feldman,et al.  The Impact of Polyploidy on Grass Genome Evolution , 2002, Plant Physiology.

[40]  Yujun Zhang,et al.  Sequence and analysis of rice chromosome 4 , 2002, Nature.

[41]  T. Gojobori,et al.  The genome sequence and structure of rice chromosome 1 , 2002, Nature.

[42]  F. B. Pickett,et al.  Splitting pairs: the diverging fates of duplicated genes , 2002, Nature Reviews Genetics.

[43]  Klaas Vandepoele,et al.  The hidden duplication past of Arabidopsis thaliana , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[44]  On the importance of being finished , 2002, Genome Biology.

[45]  J. Kelso,et al.  Impact of the presence of paralogs on sequence divergence in a set of mouse-human orthologs. , 2002, Genome research.

[46]  L. Rieseberg,et al.  Rice Genomes: A Grainy View of Future Evolutionary Research , 2002, Current Biology.

[47]  Huanming Yang,et al.  RePS: a sequence assembler that masks exact repeats identified from the shotgun data. , 2002, Genome research.

[48]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[49]  S. Wessler,et al.  Why Finishing the Rice Genome Matters , 2002, Science.

[50]  J. Bennetzen Opening the Door to Comparative Plant Biology , 2002, Science.

[51]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[52]  H. Mewes,et al.  How can we deliver the large plant genomes? Strategies and perspectives. , 2002, Current opinion in plant biology.

[53]  Brandon S. Gaut,et al.  Evolutionary dynamics of grass genomes , 2002 .

[54]  B. Williams,et al.  An Integrated Physical and Genetic Map of the Rice Genome , 2002, The Plant Cell Online.

[55]  J. Bennetzen The rice genome. Opening the door to comparative plant biology. , 2002, Science.

[56]  K. Shimamoto,et al.  Rice as a model for comparative genomics of plants. , 2002, Annual review of plant biology.

[57]  Beat Keller,et al.  Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution. , 2002, Annals of botany.

[58]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[59]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[60]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[61]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[62]  K. H. Wolfe Yesterday's polyploids and the mystery of diploidization , 2001, Nature Reviews Genetics.

[63]  E. Kellogg,et al.  Evolutionary history of the grasses. , 2001, Plant physiology.

[64]  M. Olson The maps: Clone by clone by clone , 2001, Nature.

[65]  D. G. Brown,et al.  The origins of genomic duplications in Arabidopsis. , 2000, Science.

[66]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[67]  Wang Shi-ping,et al.  Segmental Duplications Are Common in Rice Genome , 2000 .

[68]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[69]  G. Sandberg,et al.  Activation of CDK-activating kinase is dependent on interaction with H-type cyclins in plants. , 2000, The Plant journal : for cell and molecular biology.

[70]  K. Wolfe The rice genome , 2000, Nature reviews genetics.

[71]  Eric S. Lander,et al.  An SNP map of the human genome generated by reduced representation shotgun sequencing , 2000, Nature.

[72]  M A Budiman,et al.  Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. , 2000, Genome research.

[73]  J. Bennetzen Comparative Sequence Analysis of Plant Nuclear Genomes: Microcolinearity and Its Many Exceptions , 2000, Plant Cell.

[74]  M. Delseny,et al.  Extensive Duplication and Reshuffling in the Arabidopsis Genome , 2000, Plant Cell.

[75]  D. Soltis,et al.  The role of genetic and genomic attributes in the success of polyploids. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[76]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[77]  H. Bünemann,et al.  Mega-introns in the dynein gene DhDhc7(Y) on the heterochromatic Y chromosome give rise to the giant threads loops in primary spermatocytes of Drosophila hydei. , 2000, Genetics.

[78]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[79]  Dan Graur,et al.  Fundamentals of Molecular Evolution, 2nd Edition , 2000 .

[80]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[81]  Josep M. Comeron,et al.  K-Estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals , 1999, Bioinform..

[82]  Wen-Hsiung Li,et al.  Rates of Nucleotide Substitution in Angiosperm Mitochondrial DNA Sequences and Dates of Divergence Between Brassica and Other Angiosperm Lineages , 1999, Journal of Molecular Evolution.

[83]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[84]  S. Lin,et al.  A high-density rice genetic linkage map with 2275 markers using a single F2 population. , 1998, Genetics.

[85]  K. Devos,et al.  Comparative genetics in the grasses. , 1998, Plant molecular biology.

[86]  B. Gaut,et al.  DNA sequence evidence for the segmental allotetraploid origin of maize. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[87]  K. H. Wolfe,et al.  Molecular evidence for an ancient duplication of the entire yeast genome , 1997, Nature.

[88]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[89]  M T Clegg,et al.  Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[90]  M. Yano,et al.  Conservation of Duplicated Segments between Rice Chromosome 11 and 12 , 1995 .

[91]  D. Mindell Fundamentals of molecular evolution , 1991 .

[92]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[93]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[94]  C. Markert,et al.  Evolution of the Gene , 1948, Nature.

[95]  Yasuko Takahashi,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2022 .