Integrative DNA copy number detection and genotyping from sequencing and array-based platforms

Motivation: Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large‐scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single‐nucleotide polymorphism (SNP)‐array data are available. An integrated cross‐platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele‐specific reads. Results: We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform‐specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP‐array data by a hidden Markov model. We compare integrated two‐platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele‐specific reads improve CNV detection accuracy compared to existing methods. Availability and implementation: https://github.com/zhouzilu/iCNV Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  S. Mccarroll,et al.  Copy-number variation and association studies of human disease , 2007, Nature Genetics.

[2]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[3]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[4]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[5]  Sharon J. Diskin,et al.  Copy number variation at 1q21.1 associated with neuroblastoma , 2009, Nature.

[6]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[7]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[8]  Judy H Cho,et al.  Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease , 2008, Nature Genetics.

[9]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[10]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[11]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[12]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[13]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[14]  N. Carter Methods and strategies for analyzing copy number variation using DNA microarrays , 2007, Nature Genetics.

[15]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[16]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[17]  Robert T. Schultz,et al.  Autism genome-wide copy number variation reveals ubiquitin and neuronal genes , 2009, Nature.

[18]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[19]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[20]  Nancy R. Zhang,et al.  CODEX: a normalization and copy number variation detection method for whole exome sequencing , 2015, Nucleic acids research.