Twelve Platinum-Standard Reference Genomes Sequences (PSRefSeq) that complete the full range of genetic diversity of Asian rice

As the human population grows from 7.8 billion to 10 billion over the next 30 years, breeders must do everything possible to create crops that are highly productive and nutritious, while simultaneously having less of an environmental footprint. Rice will play a critical role in meeting this demand and thus, knowledge of the full repertoire of genetic diversity that exists in germplasm banks across the globe is required. To meet this demand, we describe the generation, validation and preliminary analyses of transposable element and long-range structural variation content of 12 near-gap-free reference genome sequences (RefSeqs) from representatives of 12 of 15 subpopulations of cultivated rice. When combined with 4 existing RefSeqs, that represent the 3 remaining rice subpopulations and the largest admixed population, this collection of 16 Platinum Standard RefSeqs (PSRefSeq) can be used as a pan-genome template to map resequencing data to detect virtually all standing natural variation that exists in the pan-cultivated rice genome.

[1]  Arun S. Seetharam,et al.  Effect of sequence depth and length in long-read assembly of the maize inbred NC358 , 2019, bioRxiv.

[2]  Thomas Peterson,et al.  Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline , 2019, Genome Biology.

[3]  A. Paterson,et al.  Gene duplication and genetic innovation in cereal genomes , 2019, Genome research.

[4]  Martin Vingron,et al.  SVIM: structural variant identification using mapped long reads , 2018, bioRxiv.

[5]  Rod A. Wing,et al.  The rice genome revolution: from an ancient grain to Green Super Rice , 2018, Nature Reviews Genetics.

[6]  Kenneth L. McNally,et al.  Genomic variation in 3,010 diverse accessions of Asian cultivated rice , 2018, Nature.

[7]  Kenneth L. McNally,et al.  Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza , 2018, Nature Genetics.

[8]  Qun Xu,et al.  Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice , 2018, Nature Genetics.

[9]  C. T. Hoanh,et al.  Agricultural Development and Sustainable Intensification , 2018 .

[10]  Joshua A Udall,et al.  Is It Ordered Correctly? Validating Genome Assemblies by Optical Mapping[OPEN] , 2017, Plant Cell.

[11]  Feng Luo,et al.  MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads , 2017, Nature Methods.

[12]  Mark H. Wright,et al.  Large-scale deployment of a rice 6 K SNP array for genetics and breeding applications , 2017, Rice.

[13]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[14]  Lee Ann McCue,et al.  FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool , 2017, Bioinform..

[15]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[16]  R. Wing,et al.  Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data , 2016, Scientific Data.

[17]  Rod A Wing,et al.  Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63 , 2016, Proceedings of the National Academy of Sciences.

[18]  Yang Lei,et al.  Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences , 2016, Bioinform..

[19]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[20]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[21]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[22]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[23]  rice genomes The 3,000 rice genomes project , 2014, GigaScience.

[24]  Jun Wang,et al.  The 3,000 rice genomes project: new opportunities and challenges for future rice research , 2014, GigaScience.

[25]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[26]  Jianying Yuan,et al.  Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects , 2013, 1308.2012.

[27]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[28]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[29]  A. Fujiyama,et al.  A map of rice genome variation reveals the origin of cultivated rice , 2012, Nature.

[30]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[31]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[32]  Kenneth L. McNally,et al.  Genomewide SNP variation reveals relationships among landraces and modern varieties of rice , 2009, Proceedings of the National Academy of Sciences.

[33]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[34]  Noah A. Rosenberg,et al.  CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure , 2007, Bioinform..

[35]  Bernard R. Baum,et al.  Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components , 1997, Plant Molecular Biology Reporter.

[36]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[37]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[38]  R. Wing,et al.  An improved method for plant BAC library construction. , 2003, Methods in molecular biology.

[39]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[40]  K. Devos,et al.  Comparative genetics in the grasses. , 1998, Plant molecular biology.

[41]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[42]  M. Gouy,et al.  Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. , 1989, Proceedings of the National Academy of Sciences of the United States of America.