Methods for copy number aberration detection from single-cell DNA-sequencing data

Single-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. Here we review the major steps that are followed by these methods when analyzing such data, and then review the strengths and limitations of the methods individually. In terms of segmenting the genome into regions of different copy numbers, we categorize the methods into three groups, select a representative method from each group that has been commonly used in this context, and benchmark them on simulated as well as real datasets. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.

[1]  Victor Guryev,et al.  Erratum to: Single-cell whole genome sequencing reveals no evidence for common aneuploidy in normal and Alzheimer’s disease neurons , 2016, Genome Biology.

[2]  Hamid Pezeshk,et al.  MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples , 2018, Scientific Reports.

[3]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[4]  A. Jackson,et al.  The mutation rate and cancer. , 1998, Genetics.

[5]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[6]  Liying Yang,et al.  CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Angelika Amon,et al.  Assessment of megabase-scale somatic copy number variation using single-cell sequencing , 2016, Genome research.

[8]  Yuchao Jiang,et al.  SCOPE: a normalization and copy number estimation method for single-cell DNA sequencing , 2019, bioRxiv.

[9]  Xiaobo Zhou,et al.  CaSpER: Identification, visualization and integrative analysis of CNV events in multiscale resolution using single-cell or bulk RNA sequencing data , 2018, bioRxiv.

[10]  Kenneth Lange,et al.  Reconstructing DNA copy number by joint segmentation of multiple sequences , 2012, BMC Bioinformatics.

[11]  Shuang Hou,et al.  Precision oncology using a limited number of cells: optimization of whole genome amplification products for sequencing applications , 2017, BMC Cancer.

[12]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[13]  Victor Guryev,et al.  Single-cell sequencing reveals karyotype heterogeneity in murine and human malignancies , 2016, Genome Biology.

[14]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[15]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[16]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[17]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[18]  Ho Jang,et al.  Multiresolution correction of GC bias and application to identification of copy number alterations , 2019, Bioinform..

[19]  M. Hoffmann,et al.  Reliable Single Cell Array CGH for Clinical Samples , 2014, PloS one.

[20]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[21]  Raazesh Sainudiin,et al.  A Beta-splitting model for evolutionary trees , 2015, Royal Society Open Science.

[22]  Henry M. Wood,et al.  Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data , 2012, Bioinform..

[23]  A. Bashashati,et al.  Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer , 2012, Genome research.

[24]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[25]  J. J. Shen,et al.  Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing , 2012, 1206.6627.

[26]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[27]  Victor Guryev,et al.  Single-cell whole genome sequencing reveals no evidence for common aneuploidy in normal and Alzheimer’s disease neurons , 2016, Genome Biology.

[28]  Gouri Nanjangud,et al.  Whole-genome single-cell copy number profiling from formalin-fixed paraffin-embedded samples , 2017, Nature Medicine.

[29]  Michael Olivier,et al.  Current analysis platforms and methods for detecting copy number variation. , 2013, Physiological genomics.

[30]  Funda Meric-Bernstam,et al.  Punctuated Copy Number Evolution and Clonal Stasis in Triple-Negative Breast Cancer , 2016, Nature Genetics.

[31]  Andrew C. Adey,et al.  Sequencing thousands of single-cell genomes with combinatorial indexing , 2017 .

[32]  Kevin P. Murphy,et al.  Integrating copy number polymorphisms into array CGH analysis using a robust HMM , 2006, ISMB.

[33]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[34]  J. Cowell,et al.  Double minutes and homogeneously staining regions: gene amplification in mammalian cells. , 1982, Annual review of genetics.

[35]  Michael Wigler,et al.  Genome-wide copy number analysis of single cells , 2012, Nature Protocols.

[36]  Susan Done,et al.  Whole-Genome Amplification by Degenerate Oligonucleotide Primed PCR (DOP-PCR). , 2008, CSH protocols.

[37]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[38]  Seungtai Yoon,et al.  Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm , 2011, Nucleic acids research.

[39]  Jianlin Liu,et al.  Current Progresses of Single Cell DNA Sequencing in Breast Cancer Research , 2017, International journal of biological sciences.

[40]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[41]  Adrian E. Raftery,et al.  mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation , 2012 .

[42]  Yan Song,et al.  nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data , 2016, BMC Bioinformatics.

[43]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[44]  N. Navin Cancer genomics: one cell at a time , 2014, Genome Biology.

[45]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.

[46]  Nicholas Navin,et al.  Tumor evolution: Linear, branching, neutral or punctuated? , 2017, Biochimica et biophysica acta. Reviews on cancer.

[47]  K. Livak,et al.  High-dimension single-cell analysis applied to cancer. , 2017, Molecular aspects of medicine.

[48]  M. Stratton,et al.  Universal Patterns of Selection in Cancer and Somatic Tissues , 2018, Cell.

[49]  Iain C Macaulay,et al.  Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T–seq , 2016, Nature Protocols.

[50]  A. Børresen-Dale,et al.  Copynumber: Efficient algorithms for single- and multi-track copy number segmentation , 2012, BMC Genomics.

[51]  Simon Tavaré,et al.  CNAseg - a novel framework for identification of copy number changes in cancer from second-generation sequencing data , 2010, Bioinform..

[52]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[53]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[54]  Michael C Schatz,et al.  TGF-β reduces DNA ds-break repair mechanisms to heighten genetic diversity and adaptability of CD44+/CD24− cancer cells , 2017, eLife.

[55]  Xiaohui Wang,et al.  New library construction method for single-cell genomes , 2017, PloS one.

[56]  Johan Hartman,et al.  Chemoresistance Evolution in Triple-Negative Breast Cancer Delineated by Single-Cell Sequencing , 2018, Cell.

[57]  David Posada,et al.  Sensitivity to sequencing depth in single-cell cancer genomics , 2017, Genome Medicine.

[58]  Emmanuel Barillot,et al.  Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization , 2010, Bioinform..

[59]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[60]  Nancy R. Zhang,et al.  CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing , 2017, Genome Biology.

[61]  Lin Feng,et al.  Single nucleotide variant profiles of viable single circulating tumour cells reveal CTC behaviours in breast cancer , 2018, Oncology reports.

[62]  M. Srivastava,et al.  On Tests for Detecting Change in Mean , 1975 .

[63]  Heng Wang,et al.  Copy number variation detection using next generation sequencing read counts , 2014, BMC Bioinformatics.

[64]  L. Feuk,et al.  Structural variation in the human genome , 2006, Nature Reviews Genetics.

[65]  T. Pham-Gia,et al.  Determination of the Beta distribution form its Lorenz curve , 1992 .

[66]  Peter J. Park,et al.  rSW-seq: Algorithm for detection of copy number alterations in deep sequencing data , 2010, BMC Bioinformatics.

[67]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[68]  Olivier François,et al.  Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. , 2006, Systematic biology.

[69]  Michael C. Schatz,et al.  Interactive analysis and assessment of single-cell copy-number variations , 2015, Nature Methods.

[70]  Xun Zhu,et al.  Using single-cell multiple omics approaches to resolve tumor heterogeneity , 2017, Clinical and Translational Medicine.

[71]  N. Carter,et al.  Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. , 1992, Genomics.

[72]  Misko Dzamba,et al.  Detecting copy number variation with mated short reads. , 2010, Genome research.

[73]  Hao Chen,et al.  DNA copy number profiling using single‐cell sequencing , 2018, Briefings Bioinform..

[74]  Christopher A. Miller,et al.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads , 2011, PloS one.

[75]  M. Berger,et al.  An approach to suppress the evolution of resistance in BRAFV600E-mutant cancer , 2017, Nature Medicine.

[76]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[77]  Richard A. Moore,et al.  Resource: Scalable whole genome sequencing of 40,000 single cells identifies stochastic aneuploidies, genome replication states and clonal repertoires , 2018, bioRxiv.

[78]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[79]  David Mosen-Ansorena,et al.  seqCNA: an R package for DNA copy number analysis in cancer using high-throughput sequencing , 2014, BMC Genomics.

[80]  Santhosh Girirajan,et al.  Human copy number variation and complex genetic disease. , 2011, Annual review of genetics.

[81]  K. Lange The MM Algorithm , 2013 .

[82]  Beatriz Carvalho,et al.  Focal chromosomal copy number aberrations in cancer-Needles in a genome haystack. , 2014, Biochimica et biophysica acta.

[83]  Samuel Aparicio,et al.  Scalable whole-genome single-cell library preparation without preamplification , 2017, Nature Methods.

[84]  N. Carter Methods and strategies for analyzing copy number variation using DNA microarrays , 2007, Nature Genetics.

[85]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[86]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.