A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping

Long-read sequencing technologies have greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer are combined with Bionano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 of 33.28 Mbps and covers 90% of the expected genome length. A sequence accuracy of 99.85% is obtained after aligning the assembly against Illumina Tx430 data and 99.6% of the 34,211 public gene models align to the assembly. Comparisons of Tx430 and BTx623 DLS maps against the public BTx623 v3.0.1 genome assembly suggest substantial discrepancies whose origin remains to be determined. In summary, this study demonstrates that informative assemblies of complex plant genomes can be generated by combining nanopore sequencing with DLS optical maps.Assembly of large, repeat-rich eukaryotic genomes remains challenging. Here, the authors use BioNano Genomics DLS optical mapping and single-molecule nanopore sequencing to generate a chromosome-scale assembly of a new Sorghum bicolor accession and identify variation compared to the publicly available S. bicolor genome.

[1]  Karl G. Kugler,et al.  Genome sequence of the progenitor of the wheat D genome Aegilops tauschii , 2017, Nature.

[2]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[3]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[4]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[5]  Stefan Engelen,et al.  de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer , 2016, bioRxiv.

[6]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[7]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[8]  Haibao Tang,et al.  Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum , 2015, Nature.

[9]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[10]  Ryan R. Wick,et al.  Completing bacterial genome assemblies with multiplex MinION sequencing , 2017, bioRxiv.

[11]  S. M. Sahraeian,et al.  Digital genotyping of sorghum – a diverse plant species with a large repeat-rich genome , 2013, BMC Genomics.

[12]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[13]  Detlef Weigel,et al.  High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell , 2018, Nature Communications.

[14]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[15]  E. Pennisi New technologies boost genome quality. , 2017, Science.

[16]  Sergey Koren,et al.  De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing[CC-BY] , 2017, Plant Cell.

[17]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[18]  Kees-Jan Françoijs,et al.  SMRT long-read sequencing and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica) , 2018 .

[19]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[20]  Jeffrey Ross-Ibarra,et al.  Improved maize reference genome with single-molecule technologies , 2017, Nature.

[21]  Ryan F. McCormick,et al.  The Sorghum bicolor reference genome: improved assembly and annotations, a transcriptome atlas, and signatures of genome organization , 2017, bioRxiv.

[22]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[23]  Bernardo J. Clavijo,et al.  Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. , 2017, Genome research.

[24]  R. Wing,et al.  An improved method for plant BAC library construction. , 2003, Methods in molecular biology.

[25]  I. Godwin,et al.  Highly efficient sorghum transformation , 2012, Plant Cell Reports.

[26]  P. Schnable,et al.  Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes , 2018, Nature Genetics.

[27]  S. Salzberg,et al.  Using MUMmer to Identify Similar Regions in Large Sequence Sets , 2004 .

[28]  Mick Watson,et al.  A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data , 2015, GigaScience.

[29]  L. Mao,et al.  The Aegilops tauschii genome reveals multiple impacts of transposons , 2017, Nature Plants.

[30]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[31]  Ute Roessner,et al.  The genome of Chenopodium quinoa , 2017, Nature.