TIDDIT, an efficient and comprehensive structural variant caller

Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is hindered by high computational costs, usage of non-standard output format, and limited support for the various sequencing platforms and libraries. Another well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Here we present TIDDIT, a time efficient variant caller, that uses discordant read pairs as well as the depth of coverage and split reads to detect and classify a large spectrum of SVs. As part of the software suite, TIDDIT also includes a database functionality that enables filtering for rare variants and reduces the number of false positive calls and background noise. Benchmarked against five state-of-the-art SV callers, TIDDIT performs at an equal/superior level while using only 2 CPU hours per sample. Thanks to its speed, sensitivity, flexibility and ability to easily detect variants on a wide range of WGS library types, TIDDIT solves many of the problems that are currently hindering the utilization of WGS for SV calling in clinical settings.

[1]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[2]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[3]  Lars Arvestad,et al.  BESST - Efficient scaffolding of large fragmented assemblies , 2014, BMC Bioinformatics.

[4]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[5]  Anna Lindstrand,et al.  Dominant mutations in KAT6A cause intellectual disability with recognizable syndromic features. , 2015, American journal of human genetics.

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[8]  O. Mäkitie,et al.  Low Copy Number of the AMY1 Locus Is Associated with Early-Onset Female Obesity in Finland , 2015, PloS one.

[9]  Sonja W. Scholz,et al.  Genome-wide SNP assay reveals structural genomic variation, extended homozygosity and cell-line induced alterations in normal individuals. , 2007, Human molecular genetics.

[10]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[11]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[12]  Heng Li,et al.  FermiKit: assembly-based variant calling for Illumina resequencing data , 2015, Bioinform..

[13]  Bassem A Bejjani,et al.  Application of array-based comparative genomic hybridization to clinical diagnostics. , 2006, The Journal of molecular diagnostics : JMD.

[14]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[15]  Daniel Nilsson,et al.  Whole‐Genome Sequencing of Cytogenetically Balanced Chromosome Translocations Identifies Potentially Pathological Gene Disruptions and Highlights the Importance of Microhomology in the Mechanism of Formation , 2017, Human mutation.

[16]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[17]  J. Kere,et al.  CTNND2—a candidate gene for reading problems and mild intellectual disability , 2014, Journal of Medical Genetics.

[18]  R. Pfundt,et al.  Identification of new TRIP12 variants and detailed clinical evaluation of individuals with non-syndromic intellectual disability with or without autism , 2016, Human Genetics.

[19]  D. Jong,et al.  GRM1 is upregulated through gene fusion and promoter swapping in chondromyxoid fibroma , 2014, Nature Genetics.

[20]  O. Mäkitie,et al.  Different mutations in PDE4D associated with developmental disorders with mirror phenotypes , 2013, Journal of Medical Genetics.

[21]  Ryan Bishop,et al.  Applications of fluorescence in situ hybridization (FISH) in detecting genetic aberrations of medical significance , 2010 .

[22]  M. Schatz,et al.  Accurate detection of de novo and transmitted indels within exome-capture data using micro-assembly , 2014, Nature Methods.

[23]  Masao Nagasaki,et al.  Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing , 2010, Nature Genetics.

[24]  Alexa B. R. McIntyre,et al.  Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015, Scientific Data.

[25]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[26]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[27]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[28]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[29]  Markus J. van Roosmalen,et al.  Chromothripsis as a mechanism driving complex de novo structural rearrangements in the germline. , 2011, Human molecular genetics.

[30]  A. Magi,et al.  Detection of Genomic Structural Variants from Next-Generation Sequencing Data , 2015, Front. Bioeng. Biotechnol..

[31]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[32]  E. Mardis Next-generation sequencing platforms. , 2013, Annual review of analytical chemistry.

[33]  J. Lupski,et al.  Recurrent CNVs and SNVs at the NPHP1 locus contribute pathogenic alleles to Bardet-Biedl syndrome. , 2014, American journal of human genetics.