Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome

We sequenced the Yoruban NA19240 genome on the long read sequencing platform Oxford Nanopore PromethION for benchmarking and evaluation of recently published aligners and structural variant calling tools. In this work, we determined the precision and recall, present high confidence and high sensitivity call sets of variants and discuss optimal parameters. The aligner Minimap2 and structural variant caller Sniffles are both the most accurate and the most computationally efficient tools in our study. We describe our scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long read genome sequencing of an individual or population. By discussing the results of this genome we provide an approximation of what can be expected in future long read sequencing studies aiming for structural variant identification.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Kiyoshi Asai,et al.  Training alignment parameters for arbitrary sequencers with LAST-TRAIN , 2016, Bioinform..

[3]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[4]  Ole Tange,et al.  GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[5]  F. Balloux,et al.  Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast , 2016, Nature Communications.

[6]  Brent S. Pedersen,et al.  Mosdepth: quick coverage calculation for genomes and exomes , 2017, bioRxiv.

[7]  Jonas Korlach,et al.  Discovery and genotyping of structural variation from long-read haploid genome sequence data , 2017, Genome research.

[8]  Wes McKinney,et al.  pandas: a Foundational Python Library for Data Analysis and Statistics , 2011 .

[9]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[10]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[11]  Brent S. Pedersen,et al.  cyvcf2: fast, flexible variant analysis with Python , 2017, Bioinform..

[12]  G. Escaramís,et al.  A decade of structural variants: description, history and methods to detect structural variation. , 2015, Briefings in functional genomics.

[13]  Sergey M Bezrukov,et al.  On 'three decades of nanopore sequencing' , 2016, Nature Biotechnology.

[14]  Matthew Loose,et al.  Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files , 2018 .

[15]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[16]  Michael Liem,et al.  Rapid de novo assembly of the European eel genome from nanopore sequencing reads , 2017, Scientific Reports.

[17]  Wentian Li,et al.  Mappability and read length , 2014, Front. Genet..

[18]  Harianto Tjong,et al.  Picky Comprehensively Detects High Resolution Structural Variants in Nanopore Long Reads , 2018, Nature Methods.

[19]  R. Scott Hawley,et al.  GENOME REPORT: High-quality genome assemblies of 15 Drosophila species generated using Nanopore sequencing , 2018, bioRxiv.

[20]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[21]  K. Sleegers,et al.  Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION , 2018, bioRxiv.

[22]  L. Shaffer,et al.  Genome architecture catalyzes nonrecurrent chromosomal rearrangements. , 2003, American journal of human genetics.

[23]  Brent S. Pedersen,et al.  Vcfanno: fast, flexible annotation of genetic variants , 2016, Genome Biology.

[24]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[25]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[26]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[27]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[28]  E. Eichler,et al.  Primate segmental duplications: crucibles of evolution, diversity and disease , 2006, Nature Reviews Genetics.

[29]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[30]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[31]  Sven Rahmann,et al.  Genome analysis , 2022 .

[32]  Lachlan James M. Coin,et al.  npInv: accurate detection and genotyping of inversions using long read sub-alignment , 2018, BMC Bioinformatics.

[33]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[34]  David Kainer,et al.  A comprehensive toolkit to enable MinION sequencing in any laboratory , 2018, bioRxiv.

[35]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[36]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[37]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[38]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[39]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[40]  Edwin Cuppen,et al.  Mapping and phasing of structural variation in patient genomes using nanopore sequencing , 2017, Nature Communications.

[41]  Wouter De Coster,et al.  NanoPack: visualizing and processing long-read sequencing data , 2018, bioRxiv.

[42]  Joshua Quick,et al.  Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella , 2015, Genome Biology.

[43]  C. Coulter,et al.  A complete high quality nanopore-only assembly of an XDR Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions , 2018, bioRxiv.