Toolkit for automated and rapid discovery of structural variants.

Structural variations (SV) are broadly defined as genomic alterations that affect >50bp of DNA, which are shown to have significant effect on evolution and disease. The advent of high throughput sequencing (HTS) technologies and the ability to perform whole genome sequencing (WGS), makes it feasible to study these variants in depth. However, discovery of all forms of SV using WGS has proven to be challenging as the short reads produced by the predominant HTS platforms (<200bp for current technologies) and the fact that most genomes include large amounts of repeats make it very difficult to unambiguously map and accurately characterize such variants. Furthermore, existing tools for SV discovery are primarily developed for only a few of the SV types, which may have conflicting sequence signatures (i.e. read pairs, read depth, split reads) with other, untargeted SV classes. Here we are introduce a new framework, Tardis, which combines multiple read signatures into a single package to characterize most SV types simultaneously, while preventing such conflicts. Tardis also has a modular structure that makes it easy to extend for the discovery of additional forms of SV.

[1]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[2]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[3]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[4]  John Huddleston,et al.  An Incomplete Understanding of Human Genetic Variation , 2016, Genetics.

[5]  Alexander Schliep,et al.  CLEVER: clique-enumerating variant finder , 2012, Bioinform..

[6]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[7]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[8]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[9]  E. Eichler,et al.  Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. , 2009, Genome research.

[10]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[11]  Meltz Steinberg Karyn Single haplotype assembly of the human genome from a hydatidiform mole , 2014 .

[12]  Justin M. Zook Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015 .

[13]  Christopher A. Miller,et al.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads , 2011, PloS one.

[14]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[15]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[16]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[17]  Faraz Hach,et al.  Discovery and genotyping of novel sequence insertions in many sequenced individuals , 2017, Bioinform..

[18]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[19]  G. McVean,et al.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016, bioRxiv.

[20]  P. Kwok,et al.  A Hybrid Approach for de novo Human Genome Sequence Assembly and Phasing , 2016, Nature Methods.

[21]  Ali Bashir,et al.  A geometric approach for classification and comparison of structural variants , 2009, Bioinform..

[22]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[23]  Ira M. Hall,et al.  Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. , 2010, Genome research.

[24]  Faraz Hach,et al.  Sensitive and fast mapping of di-base encoded reads , 2011, Bioinform..

[25]  Inanç Birol,et al.  Detection and characterization of novel sequence insertions using paired-end next-generation sequencing , 2010, Bioinform..

[26]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[27]  C. Alkan,et al.  MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions , 2009, Nature Methods.

[28]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[29]  Mark Gerstein,et al.  VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications , 2014, Bioinform..

[30]  Can Alkan,et al.  On genomic repeats and reproducibility , 2016, Bioinform..

[31]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[32]  Ali Bashir,et al.  Evaluation of Paired-End Sequencing Strategies for Detection of Genome Rearrangements in Cancer , 2008, PLoS Comput. Biol..

[33]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[34]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.