Correcting genomic deletion calls with complex boundaries from next generation sequencing data

Along with tumor growth, somatic alternations are continually accumulating, some of which leads to the formations of clonal populations. Genomic deletion is a major type of such genomic alternations. Although tens of computational methods were published, in the past decade, for detecting genomic deletions from next generation sequencing data, the existing algorithms often suffer an accuracy loss when they encounter the cases of deletion calls with complex boundaries. It is reported that a genomic deletion that occurs in different sub-clones may present nearby boundaries. Such deletion is considered as a deletion with complex boundaries. The existing approaches either ignore the complex-boundary cases by reporting the pair of boundaries with the largest numbers of supporting reads, or even provide incorrect results due to the interference data signals. To overcome this weakness, in this paper, we propose a heuristic method, SV-Del, to help the popular methods correct the detection errors, which are introduced by complex boundaries. The results of an existing method are the given candidate calls. SV-Del filters these calls and identifies the ones with complex boundaries. The proposed method first adopts a segmented extension algorithm and utilizes the longest variable splitting-read strategy to detect the possible pairs of boundaries in each candidate region. Then, it uses the longest variable splitting-reads to correct the detection errors which may introduced by clonal SNVs. To differentiate the detection errors from possible pairs of deletion boundaries, SV-Del estimates the numbers of sub-clones across sampled candidate regions, and then it uses a gradually separating algorithm to attain and refine the candidate calls. We applied SV-Del on a series of simulated datasets which are generated by different settings. The experiment results demonstrate that the detection accuracy is significantly improved comparing to the original results. SV-Del is also shown robust. The source codes and software package of SV-Del are uploaded at https://github.com/Hope523/SV-Del for academic uses only.

[1]  C. Swanton Intratumor heterogeneity: evolution through space and time. , 2012, Cancer research.

[2]  Joachim Weischenfeldt,et al.  SvABA: genome-wide detection of structural variants and indels by local assembly , 2018, Genome research.

[3]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[4]  L. Ding,et al.  novoBreak: local assembly for breakpoint detection in cancer genomes , 2016, Nature Methods.

[5]  Nam Huh,et al.  Phylogenetic analyses of melanoma reveal complex patterns of metastatic dissemination , 2015, Proceedings of the National Academy of Sciences.

[6]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[7]  Jonathan Sebat,et al.  SV2: Accurate Structural Variation Genotyping and De Novo Mutation Detection from Whole Genomes , 2017, bioRxiv.

[8]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[9]  Zhao Fu-sheng Comparison of Two Types of Solution of Seeking the Longest Common Substring , 2011 .

[10]  Edwin Cuppen,et al.  Mapping and phasing of structural variation in patient genomes using nanopore sequencing , 2017, bioRxiv.

[11]  Benjamin J. Raphael,et al.  Identifying structural variants using linked-read sequencing data , 2017, bioRxiv.

[12]  Jason R. Myers,et al.  Comparison of insertion/deletion calling algorithms on human next-generation sequencing data , 2014, BMC Research Notes.

[13]  Nancy R. Zhang,et al.  A genome-wide approach for detecting novel insertion-deletion variants of mid-range size , 2016, Nucleic acids research.

[14]  Donna M. Muzny,et al.  SVachra: a tool to identify genomic structural variation in mate pair sequencing data containing inward and outward facing reads , 2017, BMC Genomics.

[15]  Emmanuel Barillot,et al.  SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability , 2016, Bioinform..

[17]  Jian Ma,et al.  Allele-Specific Quantification of Structural Variations in Cancer Genomes , 2016, bioRxiv.

[18]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[19]  Can Alkan,et al.  Toolkit for automated and rapid discovery of structural variants. , 2017, Methods.

[20]  T. Speed,et al.  GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. , 2017, Genome research.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[23]  B. Pober Williams-Beuren syndrome. , 2010, The New England journal of medicine.

[24]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[25]  V A McKusick,et al.  Small deletions in the type II collagen triple helix produce kniest dysplasia. , 1999, American journal of medical genetics.

[26]  Onur Mutlu,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[27]  K. Hunter,et al.  Genetic insights into the morass of metastatic heterogeneity , 2018, Nature Reviews Cancer.

[28]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[29]  Yu Geng,et al.  TNSim: A Tumor Sequencing Data Simulator for Incorporating Clonality Information , 2018, ICIC.

[30]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[31]  Michael Brudno,et al.  Identification of complex genomic rearrangements in cancers using CouGaR , 2017, Genome research.

[32]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.