Evaluation of somatic copy number estimation tools for whole-exome sequencing data

Whole-exome sequencing (WES) has become a standard method for detecting genetic variants in human diseases. Although the primary use of WES data has been the identification of single nucleotide variations and indels, these data also offer a possibility of detecting copy number variations (CNVs) at high resolution. However, WES data have uneven read coverage along the genome owing to the target capture step, and the development of a robust WES-based CNV tool is challenging. Here, we evaluate six WES somatic CNV detection tools: ADTEx, CONTRA, Control-FREEC, EXCAVATOR, ExomeCNV and Varscan2. Using WES data from 50 kidney chromophobe, 50 bladder urothelial carcinoma, and 50 stomach adenocarcinoma patients from The Cancer Genome Atlas, we compared the CNV calls from the six tools with a reference CNV set that was identified by both single nucleotide polymorphism array 6.0 and whole-genome sequencing data. We found that these algorithms gave highly variable results: visual inspection reveals significant differences between the WES-based segmentation profiles and the reference profile, as well as among the WES-based profiles. Using a 50% overlap criterion, 13-77% of WES CNV calls were covered by CNVs from the reference set, up to 21% of the copy gains were called as losses or vice versa, and dramatic differences in CNV sizes and CNV numbers were observed. Overall, ADTEx and EXCAVATOR had the best performance with relatively high precision and sensitivity. We suggest that the current algorithms for somatic CNV detection from WES data are limited in their performance and that more robust algorithms are needed.

[1]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[2]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[3]  Henry M. Wood,et al.  Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data , 2012, Bioinform..

[4]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[5]  Lawrence A. Donehower,et al.  The somatic genomic landscape of chromophobe renal cell carcinoma. , 2014, Cancer cell.

[6]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[7]  Peter J. Park,et al.  CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms , 2008, Bioinform..

[8]  Ying Sheng,et al.  Identification of copy number variants from exome sequence data , 2014, BMC Genomics.

[9]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[10]  Yufeng Shen,et al.  CANOES: detecting rare copy number variants from whole exome sequencing data , 2014, Nucleic acids research.

[11]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of urothelial bladder carcinoma , 2014, Nature.

[12]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[13]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[14]  Jacek Majewski,et al.  FishingCNV: a graphical software package for detecting rare copy number variations in exome-sequencing data , 2013, Bioinform..

[15]  Martin Vingron,et al.  Statistical Applications in Genetics and Molecular Biology Modeling Read Counts for CNV Detection in Exome Sequencing Data , 2011 .

[16]  Jason Li,et al.  CONTRA: copy number analysis for targeted resequencing , 2012, Bioinform..

[17]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of gastric adenocarcinoma , 2014, Nature.

[18]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[19]  Sampsa Hautaniemi,et al.  Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data , 2015, Briefings Bioinform..

[20]  A. Shlien,et al.  Copy number variations and cancer , 2009, Genome Medicine.

[21]  Jun Yokota,et al.  Molecular processes of chromosome 9p21 deletions in human cancers , 2003, Oncogene.

[22]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[23]  Nicholas W. Wood,et al.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling , 2012, Bioinform..

[24]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[25]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[26]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[27]  Xutao Deng,et al.  SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data , 2011, BMC Bioinformatics.

[28]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[29]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[30]  Xin Jin,et al.  An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis , 2012, Bioinform..

[31]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[32]  Franck Sturtz,et al.  The 50th anniversary of the discovery of trisomy 21: The past, present, and future of research and treatment of Down syndrome , 2009, Genetics in Medicine.

[33]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[34]  Tatiana Popova,et al.  Supplementary Methods , 2012, Acta Neuropsychiatrica.

[35]  Yan Guo,et al.  Comparative Study of Exome Copy Number Variation Estimation Tools Using Array Comparative Genomic Hybridization as Control , 2013, BioMed research international.

[36]  Ryan Mills,et al.  Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants , 2011, Nature Biotechnology.

[37]  Andrew Collins,et al.  Exome sequence read depth methods for identifying copy number changes , 2015, Briefings Bioinform..

[38]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[39]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.

[40]  S. Halgamuge,et al.  Inferring copy number and genotype in tumour exome data , 2014, BMC Genomics.