cnvOffSeq: detecting intergenic copy number variation using off-target exome sequencing data

Motivation: Exome sequencing technologies have transformed the field of Mendelian genetics and allowed for efficient detection of genomic variants in protein-coding regions. The target enrichment process that is intrinsic to exome sequencing is inherently imperfect, generating large amounts of unintended off-target sequence. Off-target data are characterized by very low and highly heterogeneous coverage and are usually discarded by exome analysis pipelines. We posit that off-target read depth is a rich, but overlooked, source of information that could be mined to detect intergenic copy number variation (CNV). We propose cnvOffseq, a novel normalization framework for off-target read depth that is based on local adaptive singular value decomposition (SVD). This method is designed to address the heterogeneity of the underlying data and allows for accurate and precise CNV detection and genotyping in off-target regions. Results: cnvOffSeq was benchmarked on whole-exome sequencing samples from the 1000 Genomes Project. In a set of 104 gold standard intergenic deletions, our method achieved a sensitivity of 57.5% and a specificity of 99.2%, while maintaining a low FDR of 5%. For gold standard deletions longer than 5 kb, cnvOffSeq achieves a sensitivity of 90.4% without increasing the FDR. cnvOffSeq outperforms both whole-genome and whole-exome CNV detection methods considerably and is shown to offer a substantial improvement over naïve local SVD. Availability and Implementation: cnvOffSeq is available at http://sourceforge.net/p/cnvoffseq/ Contact: evangelos.bellos09@imperial.ac.uk or l.coin@imb.uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Richard Caswell,et al.  Next generation sequencing of chromosomal rearrangements in patients with split-hand/split-foot malformation provides evidence for DYNC1I1 exonic enhancers of DLX5/6 expression in humans , 2014, Journal of Medical Genetics.

[3]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[4]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[5]  Xin Jin,et al.  An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis , 2012, Bioinform..

[6]  Sara B. Linker,et al.  Comparison of Three Targeted Enrichment Strategies on the SOLiD Sequencing Platform , 2011, PloS one.

[7]  Evangelos Bellos,et al.  cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data , 2012, Genome Biology.

[8]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[9]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[10]  Nick C Fox,et al.  Exome sequencing reveals a novel partial deletion in the progranulin gene causing primary progressive aphasia , 2013, Journal of Neurology, Neurosurgery & Psychiatry.

[11]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[12]  Christian Gilissen,et al.  De novo mutations of SETBP1 cause Schinzel-Giedion syndrome , 2010, Nature Genetics.

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[15]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[16]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[17]  J. Long,et al.  Exome sequencing generates high quality data in non-target regions , 2012, BMC Genomics.