Samplot: A Platform for Structural Variant Visual Validation and Automated Filtering

Visual validation is an essential step to minimize false positive predictions resulting from structural variant (SV) detection. We present Samplot, a tool for quickly creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across multiple samples and sequencing technologies, including short, long, and phased reads. These simple images can be rapidly reviewed to curate large SV call sets. Samplot is easily applicable to many biological problems such as prioritization of potentially causal variants in disease studies, family-based analysis of inherited variation, or de novo SV review. Samplot also includes a trained machine learning package that dramatically decreases the number of false positives without human review. Samplot is available via the conda package manager or at https://github.com/ryanlayer/samplot. Contact Ryan Layer, Ph.D., Assistant Professor, University of Colorado Boulder, ryan.layer@colorado.edu.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[3]  Joseph A. Gogos,et al.  Strong association of de novo copy number mutations with sporadic schizophrenia , 2008, Nature Genetics.

[4]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[5]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[6]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[7]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[8]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[9]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[10]  Ira M. Hall,et al.  SAMBLASTER: fast duplicate marking and structural variant read extraction , 2014, Bioinform..

[11]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[12]  Noah Spies,et al.  svviz: a read viewer for validating structural variants , 2015, bioRxiv.

[13]  Ryan M. Layer,et al.  SpeedSeq: Ultra-fast personal genome analysis and interpretation , 2014, Nature Methods.

[14]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[15]  Marc L. Salit,et al.  svviz: a read viewer for validating structural variants , 2015 .

[16]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[19]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[20]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[21]  Jing Guo,et al.  A clear bias in parental origin of de novo pathogenic CNVs related to intellectual disability, developmental delay and multiple congenital anomalies , 2017, Scientific Reports.

[22]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[23]  Jonathan Sebat,et al.  SV2: Accurate Structural Variation Genotyping and De Novo Mutation Detection from Whole Genomes , 2017, bioRxiv.

[24]  Li Ding,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2018, Nature Communications.

[25]  Brent S. Pedersen,et al.  SV-plaudit: A cloud-based framework for manually curating thousands of structural variants , 2018, bioRxiv.

[26]  Michael C. Schatz,et al.  Paragraph: a graph-based structural variant genotyper for short-read sequence data , 2019, Genome Biology.

[27]  Brent S. Pedersen,et al.  Duphold: scalable, depth-based annotation and curation of high-confidence structural variant calls , 2019, GigaScience.

[28]  Y. Kamatani,et al.  Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing , 2019, Genome Biology.

[29]  Christophe Dessimoz,et al.  Structural variant calling: the long and the short of it , 2019, Genome Biology.

[30]  Leon Di Stefano,et al.  Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software , 2019, Nature Communications.

[31]  Michael C. Schatz,et al.  Paragraph: a graph-based structural variant genotyper for short-read sequence data , 2019, Genome Biology.

[32]  Ken Chen,et al.  A robust benchmark for detection of germline large deletions and insertions , 2020, Nature Biotechnology.

[33]  Brent S. Pedersen,et al.  Effective variant filtering and expected candidate variant yield in studies of rare human disease , 2020, npj Genomic Medicine.

[34]  Ryan M. Layer,et al.  The structural variation landscape in 492 Atlantic salmon genomes , 2020, Nature Communications.

[35]  Jaakko Erkinaro,et al.  The structural variation landscape in 492 Atlantic salmon genomes , 2020, Nature communications.