COSMOS: accurate detection of somatic structural variations through asymmetric comparison between tumor and normal samples

An important challenge in cancer genomics is precise detection of structural variations (SVs) by high-throughput short-read sequencing, which is hampered by the high false discovery rates of existing analysis tools. Here, we propose an accurate SV detection method named COSMOS, which compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. COSMOS also prioritizes the candidate SVs using strand-specific read-depth information. Performance tests on modeled tumor genomes revealed that COSMOS outperformed existing methods in terms of F-measure. We also applied COSMOS to an experimental mouse cell-based model, in which SVs were induced by genome engineering and gamma-ray irradiation, followed by polymerase chain reaction-based confirmation. The precision of COSMOS was 84.5%, while the next best existing method was 70.4%. Moreover, the sensitivity of COSMOS was the highest, indicating that COSMOS has great potential for cancer genome analysis.

[1]  J. Korbel,et al.  Criteria for Inference of Chromothripsis in Cancer Genomes , 2013, Cell.

[2]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[3]  Rayan Chikhi,et al.  MindTheGap: integrated detection and assembly of short and long insertions , 2014, Bioinform..

[4]  J. Lupski,et al.  Complex human chromosomal and genomic rearrangements. , 2009, Trends in genetics : TIG.

[5]  Monya Baker,et al.  Structural variation: the genome's hidden architecture , 2012, Nature Methods.

[6]  Ira M. Hall,et al.  Characterizing complex structural variation in germline and somatic genomes. , 2012, Trends in genetics : TIG.

[7]  Thomas M. Keane,et al.  Mouse genomic variation and its effect on phenotypes and gene regulation , 2011, Nature.

[8]  Cheng-Zhong Zhang,et al.  Chromothripsis and beyond: rapid genome evolution from complex chromosomal rearrangements , 2013, Genes & development.

[9]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[10]  Lovelace J. Luquette,et al.  Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes , 2013, Cell.

[11]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[12]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[13]  Ira M. Hall,et al.  Complex reorganization and predominant non-homologous repair following chromosomal breakage in karyotypically balanced germline rearrangements and transgenic integration , 2012, Nature Genetics.

[14]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[15]  K. Polyak,et al.  Tumor heterogeneity: causes and consequences. , 2010, Biochimica et biophysica acta.

[16]  Heidi Ledford Big science: The cancer genome challenge , 2010, Nature.

[17]  N. McGranahan,et al.  The causes and consequences of genetic heterogeneity in cancer evolution , 2013, Nature.

[18]  S. De,et al.  DNA secondary structures and epigenetic determinants of cancer genome evolution , 2010, Nature Structural &Molecular Biology.

[19]  N. Carter,et al.  Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development , 2011, Cell.

[20]  Emmanuel Barillot,et al.  Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization , 2010, Bioinform..

[21]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[22]  Benjamin J. Raphael,et al.  Expanding the computational toolbox for mining cancer genomes , 2014, Nature Reviews Genetics.

[23]  Kiyoshi Asai,et al.  Reference-free prediction of rearrangement breakpoint reads , 2014, Bioinform..

[24]  Hugo Y. K. Lam,et al.  Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library , 2010, Nature Biotechnology.

[25]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[26]  Hugo Y. K. Lam,et al.  Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms , 2015, Nature Communications.

[27]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[28]  Peter J. Campbell,et al.  Evolution of the cancer genome , 2012, Nature Reviews Genetics.

[29]  D. Pinto,et al.  Structural variation of chromosomes in autism spectrum disorder. , 2008, American journal of human genetics.

[30]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[31]  K. Yusa,et al.  Enhancement of microhomology-mediated genomic rearrangements by transient loss of mouse Bloom syndrome helicase , 2013, Genome research.

[32]  Subhajyoti De,et al.  DNA secondary structures and epigenetic determinants of cancer genome evolution , 2010, Genome Biology.

[33]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[34]  Modesto Orozco,et al.  Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads , 2014, Nature Biotechnology.

[35]  P. Pevzner,et al.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. , 2003, Genome research.

[36]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.