Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads

The development of high-throughput sequencing technologies has advanced our understanding of cancer. However, characterizing somatic structural variants in tumor genomes is still challenging because current strategies depend on the initial alignment of reads to a reference genome. Here, we describe SMUFIN (somatic mutation finder), a single program that directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution. Performance tests on modeled tumor genomes showed average sensitivity of 92% and 74% for SNVs and structural variants, with specificities of 95% and 91%, respectively. Analyses of aggressive forms of solid and hematological tumors revealed that SMUFIN identifies breakpoints associated with chromothripsis and chromoplexy with high specificity. SMUFIN provides an integrated solution for the accurate, fast and comprehensive characterization of somatic sequence variation in cancer.

[1]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[2]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[3]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[4]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[5]  M. Shen Chromoplexy: a new category of complex rearrangements in the cancer genome. , 2013, Cancer cell.

[6]  A. Sivachenko,et al.  Punctuated Evolution of Prostate Cancer Genomes , 2013, Cell.

[7]  Elizabeth A. McClellan,et al.  Next-generation sequencing reveals novel rare fusion events with functional implication in prostate cancer , 2014, Oncogene.

[8]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[9]  Christopher A. Miller,et al.  Background mutations in parental cells account for most of the genetic heterogeneity of induced pluripotent stem cells. , 2012, Cell stem cell.

[10]  J. Korbel,et al.  Criteria for Inference of Chromothripsis in Cancer Genomes , 2013, Cell.

[11]  R. Spang,et al.  Recurrent mutation of the ID3 gene in Burkitt lymphoma identified by integrated genome, exome and transcriptome sequencing , 2012, Nature Genetics.

[12]  Iscn International System for Human Cytogenetic Nomenclature , 1978 .

[13]  Manel Juan,et al.  Landscape of somatic mutations and clonal evolution in mantle cell lymphoma , 2013, Proceedings of the National Academy of Sciences.

[14]  Alex M. Fichtenholtz,et al.  Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing , 2013, Nature Biotechnology.

[15]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[16]  Juliane C. Dohm,et al.  Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia , 2011, Nature.

[17]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[18]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[19]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[20]  Roland Eils,et al.  Recurrent somatic alterations of FGFR1 and NTRK2 in pilocytic astrocytoma , 2013, Nature Genetics.

[21]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[22]  David T. W. Jones,et al.  Genome Sequencing of Pediatric Medulloblastoma Links Catastrophic DNA Rearrangements with TP53 Mutations , 2012, Cell.

[23]  Michael P. Schroeder,et al.  IntOGen-mutations identifies cancer drivers across tumor types , 2013, Nature Methods.

[24]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[25]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[26]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[27]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.