GROM-RD: resolving genomic biases to improve read depth detection of copy number variants

Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilities of existing approaches. Additionally, the use of read depth tools has been somewhat hindered by imprecise breakpoint identification. We developed GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. We found non-uniform variance across distinct GC regions after using existing GC bias correction methods and developed a novel approach to normalize such variance. Although complex and repetitive genome segments complicate CNV detection, GROM-RD adjusts for repeat bias and uses a two-pipeline masking approach to detect CNVs in complex and repetitive segments while improving sensitivity in less complicated regions. To overcome a typical weakness of RD methods, GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution. We compared our method to two widely used programs based on read depth methods, CNVnator and RDXplorer, and observed improved CNV detection and breakpoint accuracy for GROM-RD. GROM-RD is available at http://grigoriev.rutgers.edu/software/.

[1]  Che-Hsin Lin,et al.  Integrated optical-fiber capillary electrophoresis microchips with novel spin-on-glass surface modification. , 2004, Biosensors & bioelectronics.

[2]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[3]  D. Pinto,et al.  Structural variation of chromosomes in autism spectrum disorder. , 2008, American journal of human genetics.

[4]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[5]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[6]  Martin Vingron,et al.  Mapping translocation breakpoints by next-generation sequencing. , 2008, Genome research.

[7]  A. Børresen-Dale,et al.  COMPLEX LANDSCAPES OF SOMATIC REARRANGEMENT IN HUMAN BREAST CANCER GENOMES , 2009, Nature.

[8]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[9]  Peter J. Park,et al.  rSW-seq: Algorithm for detection of copy number alterations in deep sequencing data , 2010, BMC Bioinformatics.

[10]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[11]  Pall I. Olason,et al.  Common variants conferring risk of schizophrenia , 2009, Nature.

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  Hugo Y. K. Lam,et al.  Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library , 2010, Nature Biotechnology.

[14]  Andrew Menzies,et al.  The patterns and dynamics of genomic instability in metastatic pancreatic cancer , 2010, Nature.

[15]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[16]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[17]  Simon Tavaré,et al.  CNAseg - a novel framework for identification of copy number changes in cancer from second-generation sequencing data , 2010, Bioinform..

[18]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[19]  Seungtai Yoon,et al.  Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm , 2011, Nucleic acids research.

[20]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[21]  John Quackenbush,et al.  Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV , 2011, Bioinform..

[22]  Christopher A. Miller,et al.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads , 2011, PloS one.

[23]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[24]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[25]  Z. Ou,et al.  Observation and prediction of recurrent human translocations mediated by NAHR between nonhomologous chromosomes. , 2011, Genome research.

[26]  Eric S. Lander,et al.  The genomic complexity of primary human prostate cancer , 2010, Nature.

[27]  Emmanuel Barillot,et al.  Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization , 2010, Bioinform..

[28]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[29]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[30]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[31]  Monya Baker,et al.  Structural variation: the genome's hidden architecture , 2012, Nature Methods.

[32]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[33]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[34]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[35]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[36]  Jason Li,et al.  CONTRA: copy number analysis for targeted resequencing , 2012, Bioinform..

[37]  Henry M. Wood,et al.  Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data , 2012, Bioinform..

[38]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[39]  Andy Wing Chun Pang,et al.  Mechanisms of Formation of Structural Variation in a Fully Sequenced Human Genome , 2013, Human mutation.

[40]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[41]  Martin Dugas,et al.  RSVSim: an R/Bioconductor package for the simulation of structural variations , 2013, Bioinform..

[42]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.