The Tangent copy-number inference pipeline for cancer genome analyses

Motivation Somatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas (TCGA), the Broad Institute Genome Characterization Center developed the Tangent copy-number inference pipeline to generate copy-number profiles using single-nucleotide polymorphism (SNP) array and whole-exome sequencing (WES) data from over 10,000 pairs of tumors and matched normal samples. Here, we describe the Tangent pipeline, which begins with DNA sequencing data in the form of .bam files or raw SNP array probe-level intensity data, and ends with segmented copy-number calls to facilitate the identification of novel genes potentially targeted by SCNAs. We also describe a modification of Tangent, Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available. Results Tangent Normalization offers substantial signal-to-noise ratio (SNR) improvements compared to conventional normalization methods in both SNP array and WES analyses. The improvement in SNRs is achieved primarily through noise reduction with minimal effect on signal. Pseudo-Tangent also reduces noise when few normal samples are available. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data. Availability and Implementation Tangent is available at https://github.com/coyin/tangent and as a Docker image (https://hub.docker.com/r/coyin/tangent). Tangent is also the normalization method for the Copy Number pipeline in Genome Analysis Toolkit 4 (GATK4). Contact matthew_meyerson@dfci.harvard.edu, rameen@broadinstitute.org, gadgetz@broadinstitute.org

[1]  Joachim Weischenfeldt,et al.  SvABA: genome-wide detection of structural variants and indels by local assembly , 2018, Genome research.

[2]  Regina Bohnert,et al.  Reliability of algorithmic somatic copy number alteration detection from targeted capture data , 2017, Bioinform..

[3]  Eric Talevich,et al.  CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing , 2016, PLoS Comput. Biol..

[4]  C. Wijmenga,et al.  Gene expression analysis identifies global gene dosage sensitivity in cancer , 2015, Nature Genetics.

[5]  A. Bashashati,et al.  TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data , 2014, Genome research.

[6]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[7]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[8]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[9]  S. Gabriel,et al.  Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability , 2012, Genome research.

[10]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[11]  Ting Chen,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[12]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[13]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[14]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[15]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[16]  C. Perou,et al.  Allele-specific copy number analysis of tumors , 2010, Proceedings of the National Academy of Sciences.

[17]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[18]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[19]  John F. Atkins,et al.  A profusion of upstream open reading frame mechanisms in polyamine-responsive translational regulation , 2009, Nucleic acids research.

[20]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[21]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[22]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[23]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[24]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[25]  Matthew Meyerson,et al.  Somatic alterations in the human cancer genome. , 2004, Cancer cell.

[26]  Ashton C. Berger,et al.  Genomic and Functional Approaches to Understanding Cancer Aneuploidy. , 2018, Cancer cell.