New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica

The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the “pan-genome” of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

[1]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[2]  Md. Liakat Ali,et al.  Registration of the Rice Diversity Panel 1 for Genomewide Association Studies , 2014 .

[3]  Jianhua Zhang,et al.  Rayada specialty: the forgotten resource of elite features of rice , 2013, Rice.

[4]  Carolyn J. Lawrence-Dill,et al.  MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.

[5]  M. Yano,et al.  Control of root system architecture by DEEPER ROOTING 1 increases rice yield under drought conditions , 2013, Nature Genetics.

[6]  Jian Wang,et al.  Dissecting yield-associated loci in super hybrid rice by resequencing recombinant inbred lines and improving parental genome sequences , 2013, Proceedings of the National Academy of Sciences.

[7]  Mauricio O. Carneiro,et al.  The advantages of SMRT sequencing , 2013, Genome Biology.

[8]  Nirav Merchant,et al.  Using the iPlant Collaborative Discovery Environment , 2013, Current protocols in bioinformatics.

[9]  Adam M. Phillippy,et al.  Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies , 2013, Briefings Bioinform..

[10]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[11]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[12]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[13]  A. Fujiyama,et al.  A map of rice genome variation reveals the origin of cultivated rice , 2012, Nature.

[14]  Liwen Jiang,et al.  A Killer-Protector System Regulates Both Hybrid Sterility and Segregation Distortion in Rice , 2012, Science.

[15]  P. Pesaresi,et al.  The protein kinase Pstol1 from traditional rice confers tolerance of phosphorus deficiency , 2012, Nature.

[16]  R. Durbin,et al.  Efficient de novo assembly of large genomes using compressed data structures. , 2012, Genome research.

[17]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[18]  Lin Fang,et al.  Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes , 2011, Nature Biotechnology.

[19]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[20]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[21]  Mark H. Wright,et al.  Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa , 2011, Nature communications.

[22]  M. Jia,et al.  A Rice Diversity Panel Evaluated for Genetic and Agro‐Morphological Diversity between Subpopulations and its Geographic Distribution , 2011 .

[23]  A. Wences,et al.  Context-dependent individualization of nucleotides and virtual genomic hybridization allow the precise location of human SNPs , 2011, Proceedings of the National Academy of Sciences.

[24]  Keyan Zhao,et al.  Genetic Architecture of Aluminum Tolerance in Rice (Oryza sativa) Determined through Genome-Wide Association Analysis and QTL Mapping , 2011, PLoS genetics.

[25]  Qun Xu,et al.  Detection of copy number variations in rice using array-based comparative genomic hybridization , 2011, BMC Genomics.

[26]  B. S. Manjunath,et al.  The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..

[27]  S. Heuer,et al.  Developing Rice with High Yield under Phosphorus Deficiency: Pup1 Sequence to Application1[W][OA] , 2011, Plant Physiology.

[28]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[29]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[30]  Mona Singh,et al.  Novel genes exhibit distinct patterns of function acquisition and network integration , 2010, Genome Biology.

[31]  C. Bustamante,et al.  Development of genome-wide SNP assays for rice , 2010 .

[32]  S. Jackson,et al.  The Oryza BAC resource : a genus-wide and genome scale tool for exploring rice genome evolution and leveraging useful genetic diversity from wild relatives , 2010 .

[33]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[34]  Qifa Zhang,et al.  Genome-wide association studies of 14 agronomic traits in rice landraces , 2010, Nature Genetics.

[35]  Jeremiah D. Degenhardt,et al.  A Simple Genetic Architecture Underlies Morphological Variation in Dogs , 2010, PLoS biology.

[36]  C. Bustamante,et al.  Genomic Diversity and Introgression in O. sativa Reveal the Impact of Domestication and Breeding on the Rice Genome , 2010, PloS one.

[37]  Dmitri A. Petrov,et al.  Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes , 2010, Genome biology and evolution.

[38]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[39]  Xian-Jun Song,et al.  The ethylene response factors SNORKEL1 and SNORKEL2 allow rice to adapt to deep water , 2009, Nature.

[40]  C. Bustamante,et al.  Evolutionary History of GS3, a Gene Conferring Grain Length in Rice , 2009, Genetics.

[41]  Kenneth L. McNally,et al.  Genomewide SNP variation reveals relationships among landraces and modern varieties of rice , 2009, Proceedings of the National Academy of Sciences.

[42]  Xuehui Huang,et al.  High-throughput genotyping by whole-genome resequencing. , 2009, Genome research.

[43]  Arvind Kumar,et al.  Characterization of the effect of a QTL for drought resistance in rice, qtl12.1, over a range of environments in the Philippines and eastern India , 2009, Euphytica.

[44]  Xianran Li,et al.  Control of a key transition from prostrate to erect growth in rice domestication , 2008, Nature Genetics.

[45]  S. Kurtz,et al.  A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes , 2008, BMC Genomics.

[46]  Lei Wang,et al.  A triallelic system of S5 is a major regulator of the reproductive barrier and compatibility of indica–japonica hybrids in rice , 2008, Proceedings of the National Academy of Sciences.

[47]  Xuehui Huang,et al.  Genome-Wide Analysis of Transposon Insertion Polymorphisms Reveals Intraspecific Variation in Cultivated Rice1[W][OA] , 2008, Plant Physiology.

[48]  Kaworu Ebana,et al.  Deletion in a gene associated with grain size increased yields during rice domestication , 2008, Nature Genetics.

[49]  Sofia M. C. Robb,et al.  MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. , 2007, Genome research.

[50]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[51]  S. Mccouch,et al.  New insights into the history of rice domestication. , 2007, Trends in genetics : TIG.

[52]  Andrea Zuccolo,et al.  Transposable element distribution, abundance and role in genome size variation in the genus Oryza , 2007, BMC Evolutionary Biology.

[53]  C. Bustamante,et al.  Global Dissemination of a Single Mutation Conferring White Pericarp in Rice , 2007, PLoS genetics.

[54]  Jian-Qun Chen,et al.  Highly asymmetric rice genomes , 2007, BMC Genomics.

[55]  Qian Qian,et al.  A collection of 10,096 indica rice full-length cDNAs reveals highly expressed sequence divergence between Oryza sativa indica and japonica subspecies , 2007, Plant Molecular Biology.

[56]  J. Bailey-Serres,et al.  Sub1A is an ethylene-response-factor-like gene that confers submergence tolerance to rice , 2006, Nature.

[57]  Xiangkun Wang,et al.  Haplotype variation in structure and expression of a gene cluster associated with a quantitative trait locus for improved yield in rice. , 2006, Genome research.

[58]  W. G. Hill,et al.  Measures of human population structure show heterogeneity among genomic regions. , 2005, Genome research.

[59]  Amanda J. Garris,et al.  Genetic Structure and Diversity in Oryza sativa L. , 2005, Genetics.

[60]  Dawei Li,et al.  The Genomes of Oryza sativa: A History of Duplications , 2005, PLoS biology.

[61]  F Alex Feltus,et al.  An SNP resource for rice genetics and breeding based on subspecies indica and japonica genome alignments. , 2004, Genome research.

[62]  Jianxin Ma,et al.  Rapid recent growth and divergence of rice nuclear genomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[64]  S. Mccouch,et al.  Molecular analysis of the inheritance of the S-5 locus, conferring wide compatibility in Indica/Japonica hybrids of rice (O. sativa L.) , 1995, Theoretical and Applied Genetics.

[65]  S. Lin,et al.  Segregation distortion via male gametes in hybrids between Indica and Japonica or wide-compatibility varieties of rice (Oryza sativa L) , 1992, Theoretical and Applied Genetics.

[66]  Amanda J. Garris,et al.  Genetic structure and diversity in Oryza sativa , 2004 .

[67]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[68]  Amanda J. Garris,et al.  Population structure and its effect on haplotype diversity and linkage disequilibrium surrounding the xa5 locus of rice (Oryza sativa L.). , 2003, Genetics.

[69]  B. Han,et al.  Genome-wide intraspecific DNA-sequence variations in rice. , 2003, Current opinion in plant biology.

[70]  R. Motohashi,et al.  Polyphyletic origin of cultivated rice: based on the interspersion pattern of SINEs. , 2003, Molecular biology and evolution.

[71]  Alexander Souvorov,et al.  The relationship of protein conservation and sequence length , 2002, BMC Evolutionary Biology.

[72]  M. Yano,et al.  Substitution mapping of Pup1: a major QTL increasing phosphorus uptake of rice from a phosphorus-deficient soil , 2002, Theoretical and Applied Genetics.

[73]  J. Doebley,et al.  A single domestication for maize shown by multilocus microsatellite genotyping , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[74]  M. Yano,et al.  Diverse variation of reproductive barriers in three intraspecific rice crosses. , 2002, Genetics.

[75]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[76]  M. Yano,et al.  Mapping of QTLs for phosphorus-deficiency tolerance in rice (Oryza sativa L.) , 1998, Theoretical and Applied Genetics.

[77]  N. Huang,et al.  Mapping QTLs for phosphorus deficiency tolerance in rice (Oryza sativa L.) , 1998, Theoretical and Applied Genetics.

[78]  R. A. Forsberg,et al.  Constraints in using wild relatives in breeding: lack of basic knowledge on crop gene pools. , 1993 .

[79]  G. Sécond Molecular Markers in Rice Systematics and the Evaluation of Genetic Resources , 1991 .

[80]  H. Oka Functions and genetic basis of reproductive barriers , 1988 .

[81]  G. Sécond Origin of the genic diversity of cultivated rice (Oryza spp.): study of the polymorphism scored at 40 isozyme loci , 1982 .