Comprehensively benchmarking applications for detecting copy number variation

Motivation: Recently, copy number variation (CNV) has gained considerable interest as a type of genomic variation that plays an important role in complex phenotypes and disease susceptibility. Since a number of CNV detection methods have recently been developed, it is necessary to help investigators choose suitable methods for CNV detection depending on their objectives. For this reason, this study compared ten commonly used CNV detection applications, including CNVnator, ReadDepth, RDXplorer, LUMPY and Control-FREEC, benchmarking the applications by sensitivity, specificity and computational demands. Taking the DGV gold standard variants as a standard dataset, we evaluated the ten applications with real sequencing data at sequencing depths from 5X to 50X. Among the ten methods benchmarked, LUMPY performs the best for both high sensitivity and specificity at each sequencing depth. For the purpose of high specificity, Canvas is also a good choice. If high sensitivity is preferred, CNVnator and RDXplorer are better choices. Additionally, CNVnator and GROM-RD perform well for low-depth sequencing data. Our results provide a comprehensive performance evaluation for these selected CNV detection methods and facilitate future development and improvement in CNV prediction methods.

[1]  Gonçalo Abecasis,et al.  Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis , 2009, Nature Genetics.

[2]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.

[3]  Hongzhe Li,et al.  Parametric modeling of whole-genome sequencing data for CNV identification. , 2014, Biostatistics.

[4]  Shaoxiang Zhang,et al.  Using game theory to investigate the epigenetic control mechanisms of embryo development: Comment on: "Epigenetic game theory: How to compute the epigenetic control of maternal-to-zygotic transition" by Qian Wang et al. , 2017, Physics of life reviews.

[5]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[6]  Michael A. Black,et al.  The CNVrd2 package: measurement of copy number at complex loci using high-throughput sequencing data , 2014, Front. Genet..

[7]  Nita Parekh,et al.  iCopyDAV: Integrated platform for copy number variations—Detection, annotation and visualization , 2018, PloS one.

[8]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[9]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[10]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[11]  Wolfgang Losert,et al.  svclassify: a method to establish benchmark structural variant calls , 2015, BMC Genomics.

[12]  W. Kloosterman,et al.  The Diverse Effects of Complex Chromosome Rearrangements and Chromothripsis in Cancer Development. , 2015, Recent results in cancer research. Fortschritte der Krebsforschung. Progres dans les recherches sur le cancer.

[13]  D. Campion,et al.  APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy , 2006, Nature Genetics.

[14]  Frank Speleman,et al.  Challenges for CNV interpretation in clinical molecular karyotyping: lessons learned from a 1001 sample experience. , 2009, European journal of medical genetics.

[15]  Bin Hu,et al.  Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model and experimental optimization/validation. , 2016, Nanoscale.

[16]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[17]  Alexander Eckehart Urban,et al.  Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans , 2017, BMC Genomics.

[18]  N. Carter Methods and strategies for analyzing copy number variation using DNA microarrays , 2007, Nature Genetics.

[19]  Rafael Najmanovich,et al.  Side-chain rotamer changes upon ligand binding: common, crucial, correlate with entropy and rearrange hydrogen bonding , 2012, Bioinform..

[20]  Ming Xiao,et al.  Lineage‐associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish‐based LAUPs analysis application (JBLA) , 2018, Bioinform..

[21]  Na Li,et al.  EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients , 2017, Journal of molecular cell biology.

[22]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[23]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[24]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[25]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[26]  Lars Feuk,et al.  Strategies for the detection of copy number and other structural variants in the human genome , 2006, Human Genomics.

[27]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[28]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[29]  Yan Guo,et al.  Comparative Study of Exome Copy Number Variation Estimation Tools Using Array Comparative Genomic Hybridization as Control , 2013, BioMed research international.

[30]  M. Assadi,et al.  Evaluation of radioiodine therapy in differentiated thyroid cancer subjects with elevated serum thyroglobulin and negative whole body scan using 131I with emphasize on the thallium scintigraphy in these subgroups. , 2011, European review for medical and pharmacological sciences.

[31]  Sean D. Smith,et al.  GROM-RD: resolving genomic biases to improve read depth detection of copy number variants , 2015, PeerJ.

[32]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[33]  Christopher A. Miller,et al.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads , 2011, PloS one.

[34]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[35]  Yair Neuman,et al.  A Novel Procedure for Measuring Semantic Synergy , 2017, Complex..

[36]  Yu-ping Wang,et al.  Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing Technologies , 2013, PloS one.

[37]  Badong Chen,et al.  Building Up a Robust Risk Mathematical Platform to Predict Colorectal Cancer , 2017, Complex..

[38]  Steve Lee,et al.  Canvas: versatile and scalable detection of copy number variants , 2016, bioRxiv.