PureCN: copy number calling and SNV classification using targeted short read sequencing

BackgroundMatched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data.ResultsWe describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data.ConclusionsOur algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/).

[1]  Emmanuel Barillot,et al.  Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization , 2010, Bioinform..

[2]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[3]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[4]  S. Halgamuge,et al.  Inferring copy number and genotype in tumour exome data , 2014, BMC Genomics.

[5]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[6]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[7]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[8]  Kenric Leung,et al.  The Life History of 21 Breast Cancers , 2015, Cell.

[9]  Roland Eils,et al.  ACEseq – allele specific copy number estimation from whole genome sequencing , 2017, bioRxiv.

[10]  Benjamin J. Raphael,et al.  Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data , 2014, Bioinform..

[11]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[12]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[13]  Paul T. Spellman,et al.  Parent-specific copy number in paired tumor-normal studies using circular binary segmentation , 2011, Bioinform..

[14]  Alessandro Romanel,et al.  Unraveling the clonal hierarchy of somatic genomic aberrations , 2014, Genome Biology.

[15]  C. Perou,et al.  Allele-specific copy number analysis of tumors , 2010, Proceedings of the National Academy of Sciences.

[16]  V. Seshan,et al.  FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing , 2016, Nucleic acids research.

[17]  Alex M. Fichtenholtz,et al.  Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing , 2013, Nature Biotechnology.

[18]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[19]  Oliver Sieber,et al.  A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data , 2010, Genome Biology.

[20]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[21]  Paz Polak,et al.  Genetic Variation in Human DNA Replication Timing , 2014, Cell.

[22]  Michael P. Morrissey,et al.  Molecular analysis of a male breast cancer patient with prolonged stable disease under mTOR/PI3K inhibitors BEZ235/everolimus , 2016, Cold Spring Harbor molecular case studies.

[23]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[24]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[25]  Eric Talevich,et al.  CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing , 2016, PLoS Comput. Biol..

[26]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[27]  Nicolai J. Birkbak,et al.  Clonal status of actionable driver events and the timing of mutational processes in cancer evolution , 2015, Science Translational Medicine.

[28]  Michael Krauthammer,et al.  Global copy number profiling of cancer genomes , 2016, Bioinform..

[29]  Anders Isaksson,et al.  Patchwork: allele-specific copy number analysis of whole-genome sequenced tumor tissue , 2013, Genome Biology.

[30]  Paul Shannon,et al.  VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants , 2014, Bioinform..

[31]  Subhajyoti De,et al.  SomVarIUS: somatic variant identification from unpaired tissue samples , 2016, Bioinform..

[32]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[33]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[34]  Sohrab P. Shah,et al.  TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data , 2014, Genome research.

[35]  A. Børresen-Dale,et al.  The Life History of 21 Breast Cancers , 2012, Cell.