AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data

MOTIVATION Detection and quantification of the absolute DNA copy number alterations in tumor cells is challenging because the DNA specimen is extracted from a mixture of tumor and normal stromal cells. Estimates of tumor purity and ploidy are necessary to correctly infer copy number, and ploidy may itself be a prognostic factor in cancer progression. As deep sequencing of the exome or genome has become routine for characterization of tumor samples, in this work, we aim to develop a simple and robust algorithm to infer purity, ploidy and absolute copy numbers in whole numbers for tumor cells from sequencing data. RESULTS A simulation study shows that estimates have reasonable accuracy, and that the algorithm is robust against the presence of segmentation errors and subclonal populations. We validated our algorithm against a panel of cell lines with experimentally determined ploidy. We also compared our algorithm with the well-established single-nucleotide polymorphism array-based method called ABSOLUTE on three sets of tumors of different types. Our method had good performance on these four benchmark datasets for both purity and ploidy estimates, and may offer a simple solution to copy number alteration quantification for cancer sequencing projects. AVAILABILITY AND IMPLEMENTATION The R package absCNseq is available from http://biostats.mcc.ucsd.edu/files/absCNseq_1.0.tar.gz CONTACT: kmesser@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Jianfeng Xu,et al.  BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data , 2011, Bioinform..

[2]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[3]  Terence P. Speed,et al.  TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays , 2010, BMC Bioinformatics.

[4]  G. Abecasis,et al.  Low-coverage sequencing: implications for design of complex trait association studies. , 2011, Genome research.

[5]  Benjamin J. Raphael,et al.  THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data , 2013, Genome Biology.

[6]  J. Weinstein,et al.  Karyotypic complexity of the NCI-60 drug-screening panel. , 2003, Cancer research.

[7]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[8]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[9]  Li Zhang,et al.  PurityEst: estimating purity of human tumor samples using next-generation sequencing data , 2012, Bioinform..

[10]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[11]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[12]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[13]  Henry M. Wood,et al.  Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data , 2012, Bioinform..

[14]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[15]  P. Meltzer,et al.  The exomes of the NCI-60 panel: a genomic resource for cancer biology and systems pharmacology. , 2013, Cancer research.

[16]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[17]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[18]  Benjamin J. Raphael,et al.  Inferring Intra-tumor Heterogeneity from High-Throughput DNA Sequencing Data , 2013, RECOMB.

[19]  Marc A. Attiyeh,et al.  Genomic copy number determination in cancer cells from single nucleotide polymorphism microarrays based on quantitative genotyping corrected for aneuploidy. , 2008, Genome research.

[20]  S. Swamy,et al.  PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data , 2009, Biostatistics.

[21]  A. Sivachenko,et al.  Sequence analysis of mutations and translocations across breast cancer subtypes , 2012, Nature.

[22]  Ramón Díaz-Uriarte,et al.  Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH , 2007, PLoS Comput. Biol..

[23]  Oliver Sieber,et al.  A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data , 2010, Genome Biology.

[24]  Maurizio Zanetti,et al.  Immune Surveillance from Chromosomal Chaos? , 2012, Science.

[25]  C. Perou,et al.  Allele-specific copy number analysis of tumors , 2010, Proceedings of the National Academy of Sciences.

[26]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[27]  Cheng Li,et al.  Allele-Specific Amplification in Cancer Revealed by SNP Array Analysis , 2005, PLoS Comput. Biol..

[28]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..