ERDS-Exome: A Hybrid Approach for Copy Number Variant Detection from Whole-Exome Sequencing Data

Copy number variants (CNVs) play important roles in human disease and evolution. With the rapid development of next-generation sequencing technologies, many tools have been developed for inferring CNVs based on whole-exome sequencing (WES) data. However, as a result of the sparse distribution of exons in the genome, the limitations of the WES technique, and the nature of high-level signal noises in WES data, the efficacy of these variants remains less than desirable. Thus, there is need for the development of an effective tool to achieve a considerable power in WES CNVs discovery. In the present study, we describe a novel method, Estimation by Read Depth (RD) with Single-nucleotide variants from exome sequencing data (ERDS-exome). ERDS-exome employs a hybrid normalization approach to normalize WES data and to incorporate RD and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. Based on systematic evaluations of real data from the 1000 Genomes Project using other state-of-the-art tools, we observed that ERDS-exome demonstrates higher sensitivity and provides comparable or even better specificity than other tools. ERDS-exome is publicly available at: https://erds-exome.github.io.

[1]  Andrew Collins,et al.  Exome sequence read depth methods for identifying copy number changes , 2015, Briefings Bioinform..

[2]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[3]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[4]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[5]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[6]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.

[7]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Yufeng Shen,et al.  CANOES: detecting rare copy number variants from whole exome sequencing data , 2014, Nucleic acids research.

[10]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[11]  Yadong Wang,et al.  ERDS-pe: A paired hidden Markov model for copy number variant detection from whole-exome sequencing data , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[12]  Leslie G Biesecker,et al.  Diagnostic clinical genome and exome sequencing. , 2014, The New England journal of medicine.

[13]  Celine S. Hong,et al.  Assessing the reproducibility of exome copy number variations predictions , 2016, Genome Medicine.

[14]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[15]  Monkol Lek,et al.  Patterns of genic intolerance of rare copy number variation in 59,898 human exomes , 2016, Nature Genetics.

[16]  Clara Gaff,et al.  Patient safety in genomic medicine: an exploratory study , 2016, Genetics in Medicine.

[17]  Yadong Wang,et al.  A gradient-boosting approach for filtering de novo mutations in parent-offspring trios , 2014, Bioinform..

[18]  Agus Salim,et al.  Statistical challenges associated with detecting copy number variations with next-generation sequencing , 2012, Bioinform..

[19]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[20]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[21]  Deborah A Nickerson,et al.  De novo rates and selection of large copy number variation. , 2010, Genome research.

[22]  Avi Ma'ayan,et al.  Identification of small exonic CNV from whole-exome sequence data and application to autism spectrum disorder. , 2013, American journal of human genetics.

[23]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[24]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[25]  Martin Vingron,et al.  Statistical Applications in Genetics and Molecular Biology Modeling Read Counts for CNV Detection in Exome Sequencing Data , 2011 .

[26]  Frederick E. Dewey,et al.  CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data , 2015, Bioinform..

[27]  Nicholas W. Wood,et al.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling , 2012, Bioinform..

[28]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[29]  Nancy R. Zhang,et al.  CODEX: a normalization and copy number variation detection method for whole exome sequencing , 2015, Nucleic acids research.

[30]  K. Shianna,et al.  Using ERDS to infer copy-number variants in high-coverage genomes. , 2012, American journal of human genetics.

[31]  Alberto Magi,et al.  Read count approach for DNA copy number variants detection , 2012, Bioinform..

[32]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[33]  Yadong Wang,et al.  A novel method to measure the semantic similarity of HPO terms , 2017, Int. J. Data Min. Bioinform..

[34]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[35]  Yan Guo,et al.  Comparative Study of Exome Copy Number Variation Estimation Tools Using Array Comparative Genomic Hybridization as Control , 2013, BioMed research international.

[36]  J. R. MacDonald,et al.  A copy number variation map of the human genome , 2015, Nature Reviews Genetics.