Consistency-based detection of potential tumor-specific deletions in matched normal/tumor genomes

BackgroundStructural variations in human genomes, such as insertions, deletion, or rearrangements, play an important role in cancer development. Next-Generation Sequencing technologies have been central in providing ways to detect such variations. Most existing methods however are limited to the analysis of a single genome, and it is only recently that the comparison of closely related genomes has been considered. In particular, a few recent works considered the analysis of data sets obtained by sequencing both tumor and healthy tissues of the same cancer patient. In that context, the goal is to detect variations that are specific to exactly one of the genomes, for example to differentiate between patient-specific and tumor-specific variations. This is a difficult task, especially when facing the additional challenge of the possible contamination of healthy tissues by tumor cells and conversely.ResultsIn the current work, we analyzed a data set of paired-end short-reads, obtained by sequencing tumor tissues and healthy tissues, both from the same cancer patient. Based on a combinatorial notion of conflict between deletions, we show that in the tumor data, more deletions are predicted than there could actually be in a diploid genome. In contrast, the predictions for the data from normal tissues are almost conflict-free. We designed and applied a method, specific to the analysis of such pooled and contaminated data sets, to detect potential tumor-specific deletions. Our method takes the deletion calls from both data sets and assigns reads from the mixed tumor/normal data to the normal one with the goal to minimize the number of reads that need to be discarded to obtain a set of conflict-free deletion clusters. We observed that, on the specific data set we analyze, only a very small fraction of the reads needs to be discarded to obtain a set of consistent deletions.ConclusionsWe present a framework based on a rigorous definition of consistency between deletions and the assumption that the tumor sample also contains normal cells. A combined analysis of both data sets based on this model allowed a consistent explanation of almost all data, providing a detailed picture of candidate patient- and tumor-specific deletions.

[1]  Mark Gerstein,et al.  Personal genome sequencing: current approaches and challenges. , 2010, Genes & development.

[2]  Michael Brudno,et al.  Genome Variation Discovery with High-throughput Sequencing Data , 2022 .

[3]  Tom Royce,et al.  A comprehensive catalogue of somatic mutations from a human cancer genome , 2010, Nature.

[4]  Kenneth H. Buetow,et al.  Bioinformatics Applications Note Sequence Analysis Bambino: a Variant Detector and Alignment Viewer for Next-generation Sequencing Data in the Sam/bam Format , 2022 .

[5]  Elaine R Mardis,et al.  Cancer genomics identifies determinants of tumor biology , 2010, Genome Biology.

[6]  Tamon Stephen,et al.  Minimal Conflicting Sets for the Consecutive Ones Property in Ancestral Genome Reconstruction , 2010, J. Comput. Biol..

[7]  K. Chin,et al.  End-sequence profiling: Sequence-based analysis of aberrant genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  S. Gabriel,et al.  Advances in understanding cancer genomes through second-generation sequencing , 2010, Nature Reviews Genetics.

[9]  W. Cavenee,et al.  Loss of constitutional heterozygosity in human cancer. , 1991, Annual review of genetics.

[10]  E. Eichler,et al.  Simultaneous structural variation discovery among multiple paired-end sequenced genomes. , 2011, Genome research.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  Jens Stoye,et al.  A Unified Approach for Reconstructing Ancient Gene Clusters , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Ken Chen,et al.  Recurring mutations found by sequencing an acute myeloid leukemia genome. , 2009, The New England journal of medicine.

[14]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[15]  Konrad H. Paszkiewicz,et al.  De novo assembly of short sequence reads , 2010, Briefings Bioinform..

[16]  Ali Bashir,et al.  A geometric approach for classification and comparison of structural variants , 2009, Bioinform..

[17]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[18]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[19]  Joshua F. McMichael,et al.  Genome Remodeling in a Basal-like Breast Cancer Metastasis and Xenograft , 2010, Nature.

[20]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[21]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[22]  Michael C Wendl,et al.  Statistical aspects of discerning indel-type structural variation via DNA sequence alignment , 2009, BMC Genomics.

[23]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[24]  Richard K. Wilson,et al.  Challenges of sequencing human genomes , 2010, Briefings Bioinform..

[25]  André Altmann,et al.  vipR: variant identification in pooled DNA using R , 2011, Bioinform..

[26]  Iman Hajirasouliha,et al.  Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes , 2011, RECOMB.

[27]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[28]  Steven J. M. Jones,et al.  Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors , 2010, Genome Biology.

[29]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[30]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[31]  Seunghak Lee,et al.  MoGUL: Detecting Common Insertions and Deletions in a Population , 2010, RECOMB.

[32]  C. Alkan,et al.  MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions , 2009, Nature Methods.

[33]  Gurpreet W. Tang,et al.  Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes , 2009, Nature.

[34]  Keith Robison,et al.  Application of second-generation sequencing to cancer genomics , 2010, Briefings Bioinform..