Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data

Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the Expectation-Maximization (EM) algorithm, and Sparse Bayesian Learning (SBL) were customized and built into the model. Accucopy is implemented in C++/Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/. Conclusions We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.

[1]  Li Zhang,et al.  PurityEst: estimating purity of human tumor samples using next-generation sequencing data , 2012, Bioinform..

[2]  Antonio Ortega,et al.  Sparse representation and Bayesian detection of genome copy number alterations from microarray data , 2008, Bioinform..

[3]  Sohrab P. Shah,et al.  TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data , 2014, Genome research.

[4]  Christopher J. R. Illingworth,et al.  High-Definition Reconstruction of Clonal Composition in Cancer , 2014, Cell reports.

[5]  Benjamin J. Raphael,et al.  THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data , 2013, Genome Biology.

[6]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[7]  S. Sleijfer,et al.  Pan-cancer whole-genome analyses of metastatic solid tumours , 2019, Nature.

[8]  Nancy R. Zhang,et al.  Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing , 2016, Proceedings of the National Academy of Sciences.

[9]  L. Coin,et al.  sCNAphase: using haplotype resolved read depth to genotype somatic copy number alterations from low cellularity aneuploid tumors , 2016, bioRxiv.

[10]  D. Ledbetter,et al.  Multicolor Spectral Karyotyping of Human Chromosomes , 1996, Science.

[11]  Yupeng Cun,et al.  Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust , 2018, Nature Protocols.

[12]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[13]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[14]  Nilgun Donmez,et al.  Clonality inference in multiple tumor samples using phylogeny , 2015, Bioinform..

[15]  P. Edwards,et al.  Non‐random chromosomal rearrangements in pancreatic cancer cell lines identified by spectral karyotyping , 2001, International journal of cancer.

[16]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[17]  Ao Li,et al.  CLImAT: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data , 2014, Bioinform..

[18]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[19]  Yao Su,et al.  Accurity: accurate tumor purity and ploidy inference from tumor-normal WGS data by jointly modelling somatic copy number alterations and heterozygous germline single-nucleotide-variants , 2018, Bioinform..

[20]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[21]  Richard Simon,et al.  Implementing personalized cancer genomics in clinical trials , 2013, Nature Reviews Drug Discovery.

[22]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[23]  Henry M. Wood,et al.  Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data , 2012, Bioinform..

[24]  Z. Szallasi,et al.  Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[25]  Nicholas B. Larson,et al.  PurBayes: estimating tumor cellularity and subclonality in next-generation sequencing data , 2013, Bioinform..

[26]  Nancy R. Zhang,et al.  Allele-specific copy number profiling by next-generation DNA sequencing , 2014, Nucleic acids research.

[27]  Shankar Vembu,et al.  PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors , 2015, Genome Biology.

[28]  Christopher T. Saunders,et al.  Strelka2: fast and accurate calling of germline and somatic variants , 2018, Nature Methods.