svtools: population-scale analysis of structural variation

Summary Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps – including deletions, duplications, mobile element insertions, inversions, and other rearrangements – in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g., via LUMPY), while providing fast and affordable joint analysis at the scale of ≥100,000 genomes. These tools will help enable the next generation of human genetics studies. Availability and Implementation svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. Contact ihall@wustl.edu

[1]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[2]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[3]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[4]  Markus Hsi-Yang Fritz,et al.  Efficient storage of high throughput DNA sequencing data using reference-based compression. , 2011, Genome research.

[5]  Yeting Zhang,et al.  Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects , 2018 .

[6]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[7]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[8]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010, Nature.

[9]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[10]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010 .

[11]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[12]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[13]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[14]  Xin Li,et al.  The impact of structural variation on human gene expression , 2016, Nature Genetics.

[15]  Haley J. Abel,et al.  SVScore: an impact prediction tool for structural variation , 2016, bioRxiv.

[16]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[17]  Ryan M. Layer,et al.  SpeedSeq: Ultra-fast personal genome analysis and interpretation , 2014, Nature Methods.

[18]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[19]  Yeting Zhang,et al.  Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects , 2018, Nature Communications.