A novel high-accuracy genome assembly method utilizing a high-throughput workflow

Across domains of biological research using genome sequence data, high-quality reference genome sequences are essential for characterizing genetic variation and understanding the genetic basis of phenotypes. However, the construction of genome assemblies for various species is often hampered by complexities of genome organization, especially repetitive and complex sequences, leading to mis-assembly and missing regions. Here, we describe a high-throughput gold standard genome assembly workflow using a large-scale bacterial artificial chromosome (BAC) library with a refined two-step pooling strategy and the Lamp assembler algorithm. This strategy minimizes the laborious processes of physical map construction and clone-by-clone sequencing, enabling inexpensive sequencing of several thousand BAC clones. By applying this strategy with a minimum tiling path BAC clone library for the short arm of chromosome 2D (2DS) of bread wheat, 98% of BAC sequences, covering 92.7% of the 2DS chromosome, were assembled correctly for this species with a highly complex and repetitive genome. We also identified 48 large mis-assemblies in the reference wheat genome assembly (IWGSC RefSeq v1.0) and corrected these large mis-assemblies in addition to filling 92.2% of the gaps in RefSeq v1.0. Our 2DS assembly represents a new benchmark for the assembly of complex genomes with both high accuracy and efficiency.

[1]  Weiming He,et al.  The improved assembly of 7DL chromosome provides insight into the structure and evolution of bread wheat , 2019, Plant biotechnology journal.

[2]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[3]  J. Anderson,et al.  Wheat Fhb1 encodes a chimeric lectin with agglutinin domains and a pore-forming toxin-like domain conferring resistance to Fusarium head blight , 2016, Nature Genetics.

[4]  P. Wincker,et al.  A reference genome for pea provides insight into legume genome evolution , 2019, Nature Genetics.

[5]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[6]  B. Birren,et al.  Genome Project Standards in a New Era of Sequencing , 2009, Science.

[7]  K. Mayer,et al.  Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome dynamics between two wheat cultivars , 2018, Genome Biology.

[8]  J. Batley,et al.  A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome , 2014, Science.

[9]  É. Cadieu,et al.  Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains , 2018, BMC Genomics.

[10]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[11]  Mario Stanke,et al.  Whole-Genome Annotation with BRAKER. , 2019, Methods in molecular biology.

[12]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[13]  Bjarni V. Halldórsson,et al.  Diversity in non-repetitive human sequences not found in the reference genome , 2017, Nature Genetics.

[14]  Pierre Sourdille,et al.  A Physical Map of the 1-Gigabase Bread Wheat Chromosome 3B , 2008, Science.

[15]  Matthew W. Snyder,et al.  Genomic Medicine–Progress, Pitfalls, and Promise , 2019, Cell.

[16]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[17]  T. Graves,et al.  The Physical and Genetic Framework of the Maize B73 Genome , 2009, PLoS genetics.

[18]  Ajay Mahaputra Kumar,et al.  Development of a D genome specific marker resource for diploid and hexaploid wheat , 2015, BMC Genomics.

[19]  K. Mayer,et al.  TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools , 2019, Genome Biology.

[20]  Leszek P. Pryszcz,et al.  Genome Comparison of Candida orthopsilosis Clinical Strains Reveals the Existence of Hybrids between Two Distinct Subspecies , 2014, Genome biology and evolution.

[21]  E. Eichler,et al.  Long-read sequencing and de novo assembly of a Chinese genome , 2016, Nature Communications.

[22]  I. Nookaew,et al.  Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics. , 2017, Genomics.

[23]  Robert P. Davey,et al.  An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations , 2016, bioRxiv.

[24]  Shijie Zhang,et al.  High genome heterozygosity and endemic genetic recombination in the wheat stripe rust fungus , 2013, Nature Communications.

[25]  Y. Peer,et al.  The evolutionary significance of polyploidy , 2017, Nature Reviews Genetics.

[26]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[27]  T. Graves,et al.  The physical and genetic framework of the maize B 73 genome , 2019 .

[28]  John K. McCooke,et al.  A chromosome conformation capture ordered sequence of the barley genome , 2017, Nature.

[29]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[30]  Edward S. Buckler,et al.  Crop genomics: advances and applications , 2011, Nature Reviews Genetics.

[31]  S. Salzberg,et al.  Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies , 2020, Genetics.

[32]  Jianbing Yan,et al.  Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement , 2019, Nature Genetics.

[33]  Marek Figlerowicz,et al.  Copy number polymorphism in plant genomes , 2013, Theoretical and Applied Genetics.

[34]  K. Mayer,et al.  TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools , 2019, Genome Biology.

[35]  Sergey Koren,et al.  Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii , a progenitor of bread wheat , with the mega-reads algorithm , 2016 .

[36]  Rachel M. Sherman,et al.  Pan-genomics in the human genome era , 2020, Nature Reviews Genetics.

[37]  Hong-Il Choi,et al.  Rapid amplification of four retrotransposon families promoted speciation and genome size expansion in the genus Panax , 2017, Scientific Reports.

[38]  Bernardo J. Clavijo,et al.  The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum , 2017, bioRxiv.

[39]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[40]  Bin Ma,et al.  Genome sequence of the progenitor of wheat A subgenome Triticum urartu , 2018, Nature.

[41]  S. Bicciato,et al.  Comparison of computational methods for Hi-C data analysis , 2017, Nature Methods.

[42]  Stéphanie Mathieu,et al.  Single nucleus sequencing reveals evidence of inter-nucleus recombination in arbuscular mycorrhizal fungi , 2018, eLife.

[43]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[44]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[45]  Jose Lugo-Martinez,et al.  Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies , 2014, PLoS Comput. Biol..

[46]  M. Baker De novo genome assembly: what every biologist should know , 2012, Nature Methods.

[47]  Qun Xu,et al.  Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice , 2018, Nature Genetics.

[48]  E. Nevo,et al.  Horizontal gene transfer of Fhb7 from fungus underlies Fusarium head blight resistance in wheat , 2020, Science.

[49]  Jan van Oeveren,et al.  Sequence-based physical mapping of complex genomes by whole genome profiling. , 2011, Genome research.

[50]  Ping Liu,et al.  A genome for gnetophytes and early evolution of seed plants , 2018, Nature Plants.

[51]  Jonathan D. G. Jones,et al.  Shifting the limits in wheat research and breeding using a fully annotated reference genome , 2018, Science.

[52]  D. Neale,et al.  Novel Insights into Tree Biology and Genome Evolution as Revealed Through Genomics. , 2017, Annual review of plant biology.

[53]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[54]  E. Green,et al.  Prioritizing diversity in human genomics research , 2017, Nature Reviews Genetics.

[55]  Davide Heller,et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..

[56]  Ying Li,et al.  Single Nucleus Genome Sequencing Reveals High Similarity among Nuclei of an Endomycorrhizal Fungus , 2014, PLoS genetics.

[57]  E. R. Sears,et al.  Studies of isozyme patterns in nullisomic-tetrasomic combinations of hexaploid wheat. , 1969, Proceedings of the National Academy of Sciences of the United States of America.

[58]  J. Batley,et al.  Plant pan-genomes are the new reference , 2020, Nature Plants.

[59]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[60]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[61]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[62]  Gong Zhang,et al.  Misassembly of long reads undermines de novo-assembled ethnicity-specific genomes: validation in a Chinese Han population , 2019, Human Genetics.

[63]  David M. Thomas,et al.  Optical mapping reveals a higher level of genomic architecture of chained fusions in cancer , 2018, Genome research.

[64]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[65]  S. Sabbadini,et al.  New Biotechnological Tools for the Genetic Improvement of Major Woody Fruit Species , 2017, Front. Plant Sci..

[66]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[67]  Jan Vrána,et al.  Chromosomes in the flow to simplify genome analysis , 2012, Functional & Integrative Genomics.

[68]  J. S. Heslop-Harrison,et al.  Repetitive DNA in eukaryotic genomes , 2015, Chromosome Research.

[69]  Jennifer A. Doudna,et al.  THE PROMISE AND CHALLENGE OF THERAPEUTIC GENOME EDITING , 2020, Nature.

[70]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.