Free-access copy-number variant detection tools for targeted next-generation sequencing data.

Copy number variants (CNVs) are intermediate-scale structural variants containing copy number changes involving DNA fragments of between 1 kb and 5 Mb. Although known to account for a significant proportion of the genetic burden in human disease, the role of CNVs (especially small CNVs) is often underestimated, as they are undetectable by traditional Sanger sequencing. Since the development of next-generation sequencing (NGS) technologies, several research groups have compared depth of coverage (DoC) patterns between samples, an approach that may facilitate effective CNV detection. Most CNV detection tools based on DoC comparisons are designed to work with whole-genome sequencing (WGS) or whole-exome sequencing (WES) data. However, few methods developed to date are designed for custom/commercial targeted NGS (tg-NGS) panels, the assays most commonly used for diagnostic purposes. Moreover, the development and evaluation of these tools is hindered by (i) the scarcity of thoroughly annotated data containing CNVs and (ii) a dearth of simulation tools for WES and tg-NGS that mimic the errors and biases encountered in these data. Here, we review DoC-based CNV detection methods described in the current literature, assess their performance with simulated tg-NGS data, and discuss their strengths and weaknesses when integrated into the daily laboratory workflow. Our findings suggest that the best methods for CNV detection in tg-NGS panels are DECoN, ExomeDepth, and ExomeCNV. Regardless of the method used, there is a need to make these programs more user-friendly to enable their use by diagnostic laboratory staff who lack bioinformatics training.

[1]  Mark Gerstein,et al.  VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications , 2014, Bioinform..

[2]  M. Marra,et al.  Massively parallel sequencing: the next big thing in genetic medicine. , 2009, American journal of human genetics.

[3]  Michael A. Black,et al.  The CNVrd2 package: measurement of copy number at complex loci using high-throughput sequencing data , 2014, Front. Genet..

[4]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[5]  L. Baudhuin A new era of genetic testing and its impact on research and clinical care. , 2012, Clinical chemistry.

[6]  Nicholas W. Wood,et al.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling , 2012, Bioinform..

[7]  M. Swertz,et al.  CoNVaDING: Single Exon Variation Detection in Targeted NGS Data , 2016, Human mutation.

[8]  Agus Salim,et al.  Statistical challenges associated with detecting copy number variations with next-generation sequencing , 2012, Bioinform..

[9]  Misko Dzamba,et al.  Detecting copy number variation with mated short reads. , 2010, Genome research.

[10]  Ash A. Alizadeh,et al.  Genome-wide analysis of DNA copy-number changes using cDNA microarrays , 1999, Nature Genetics.

[11]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[12]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[13]  Marcel J. T. Reinders,et al.  De novo detection of copy number variation by co-assembly , 2012, Bioinform..

[14]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[15]  Carolyn J. Brown,et al.  A comprehensive analysis of common copy-number variations in the human genome. , 2007, American journal of human genetics.

[16]  Tsunglin Liu,et al.  Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly , 2013, PloS one.

[17]  R. Sinke,et al.  Targeted Next‐Generation Sequencing can Replace Sanger Sequencing in Clinical Diagnostics , 2013, Human mutation.

[18]  Christian Burks,et al.  GenFrag 2.1: new features for more robust fragment assembly benchmarks , 1994, Comput. Appl. Biosci..

[19]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[20]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[21]  Qiang Hu,et al.  SCNVSim: somatic copy number variation and structure variation simulator , 2015, BMC Bioinformatics.

[22]  J. R. MacDonald,et al.  A copy number variation map of the human genome , 2015, Nature Reviews Genetics.

[23]  Li Zhao,et al.  SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data , 2017, BMC Bioinformatics.

[24]  Iuliana Ionita-Laza,et al.  Genetic association analysis of copy-number variation (CNV) in human disease pathogenesis. , 2009, Genomics.

[25]  Nancy R. Zhang,et al.  CODEX: a normalization and copy number variation detection method for whole exome sequencing , 2015, Nucleic acids research.

[26]  Richard Durbin,et al.  A large genome center's improvements to the Illumina sequencing system , 2008, Nature Methods.

[27]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[28]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[29]  J. Schouten,et al.  Multiplex Ligation-dependent Probe Amplification (MLPA®) for the detection of copy number variation in genomic sequences. , 2011, Methods in molecular biology.

[30]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[31]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[32]  Gilles Fischer,et al.  Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries , 2015, Bioinform..

[33]  Ali Bashir,et al.  A geometric approach for classification and comparison of structural variants , 2009, Bioinform..

[34]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[35]  Jason Li,et al.  CONTRA: copy number analysis for targeted resequencing , 2012, Bioinform..

[36]  Martin Dugas,et al.  RSVSim: an R/Bioconductor package for the simulation of structural variations , 2013, Bioinform..

[37]  Eric Talevich,et al.  CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing , 2016, PLoS Comput. Biol..

[38]  Jiantao Wu,et al.  Copy Number Variation detection from 1000 Genomes project exon capture sequencing data , 2012, BMC Bioinformatics.

[39]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[40]  Sebastian M. Waszak,et al.  Systematic Inference of Copy-Number Genotypes from Personal Genome Sequencing Data Reveals Extensive Olfactory Receptor Gene Content Diversity , 2010, PLoS Comput. Biol..

[41]  Kiyoshi Asai,et al.  PBSIM: PacBio reads simulator - toward accurate genome assembly , 2013, Bioinform..

[42]  N. Carter Methods and strategies for analyzing copy number variation using DNA microarrays , 2007, Nature Genetics.

[43]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[44]  J. Sebat,et al.  Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. , 2003, Genome research.

[45]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[46]  Gregory Kucherov,et al.  RNF: a general framework to evaluate NGS read mappers , 2015, Bioinform..

[47]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[48]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[49]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[50]  E. Eichler,et al.  Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. , 2009, Genome research.

[51]  Song Liu,et al.  Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges , 2013, Oncotarget.

[52]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[53]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[54]  Heng Wang,et al.  Copy number variation detection using next generation sequencing read counts , 2014, BMC Bioinformatics.

[55]  Yufeng Shen,et al.  CANOES: detecting rare copy number variants from whole exome sequencing data , 2014, Nucleic acids research.

[56]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[57]  Evangelos Bellos,et al.  cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data , 2012, Genome Biology.

[58]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[59]  M. Gerstein,et al.  PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data , 2009, Genome Biology.

[60]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[61]  Timothy B. Stockwell,et al.  Evaluation of next generation sequencing platforms for population targeted sequencing studies , 2009, Genome Biology.

[62]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[63]  T. Foroud,et al.  Neurology Individualized Medicine: When to Use Next-Generation Sequencing Panels. , 2017, Mayo Clinic proceedings.

[64]  P. Zandi,et al.  Whole-genome CNV analysis: advances in computational approaches , 2015, Front. Genet..

[65]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[66]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[67]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[68]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[69]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[70]  Mohammed Alhashem,et al.  Comprehensive gene panels provide advantages over clinical exome sequencing for Mendelian diseases , 2015, Genome Biology.

[71]  S. Gabriel,et al.  Advances in understanding cancer genomes through second-generation sequencing , 2010, Nature Reviews Genetics.

[72]  Martin Vingron,et al.  Statistical Applications in Genetics and Molecular Biology Modeling Read Counts for CNV Detection in Exome Sequencing Data , 2011 .

[73]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[74]  Frederick E. Dewey,et al.  CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data , 2015, Bioinform..

[75]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[76]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[77]  Nazneen Rahman,et al.  Accurate clinical detection of exon copy number variants in a targeted NGS panel using DECoN , 2016, Wellcome open research.

[78]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[79]  D. Goldstein,et al.  Sequencing studies in human genetics: design and interpretation , 2013, Nature Reviews Genetics.

[80]  Simon Tavaré,et al.  CNAseg - a novel framework for identification of copy number changes in cancer from second-generation sequencing data , 2010, Bioinform..

[81]  Sarah A. Killcoyne,et al.  FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses , 2013, BMC Bioinformatics.

[82]  T. Thomas,et al.  GemSIM: general, error-model based simulator of next-generation sequencing data , 2012, BMC Genomics.

[83]  Vineet Bafna,et al.  Wessim: a whole-exome sequencing simulator based on in silico exome capture , 2013, Bioinform..

[84]  Stuart N Peirson,et al.  Quantitative polymerase chain reaction. , 2007, Methods in molecular biology.

[85]  Inge Jonassen,et al.  Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim , 2010, Bioinform..