Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model

BackgroundCopy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale.ResultsWe developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms.ConclusionsIn particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.

[1]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[2]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[3]  Kesheng Wang,et al.  A Bayesian segmentation approach to ascertain copy number variations at the population level , 2009, Bioinform..

[4]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[5]  J. Shendure,et al.  Materials and Methods Som Text Figs. S1 and S2 Tables S1 to S4 References Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2022 .

[6]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[7]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[8]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[9]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[10]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[11]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  C. Hsiung,et al.  A Bayes Regression Approach to Array-CGH Data , 2006, Statistical applications in genetics and molecular biology.

[13]  Sylvia Richardson,et al.  Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model , 2006, Bioinform..

[14]  Chandra Erdman,et al.  A fast Bayesian change point analysis for the segmentation of microarray data , 2008, Bioinform..

[15]  Antonio Ortega,et al.  Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA , 2009, Bioinform..

[16]  Elena Marchiori,et al.  Chromosomal Breakpoint Detection in Human Cancer , 2003, EvoWorkshops.

[17]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[18]  Jane Fridlyand,et al.  Erratum: "Hidden Markov models approach to the analysis of array CGH data" (Journal of Multivariate Analysis (2004) vol. 90 (132-153) 10.1016/j.jmva.2004.02.008) , 2005 .

[19]  O. Kallioniemi,et al.  Genome screening by comparative genomic hybridization. , 1997, Trends in genetics : TIG.

[20]  Kevin P. Murphy,et al.  Integrating copy number polymorphisms into array CGH analysis using a robust HMM , 2006, ISMB.

[21]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[22]  L. Recht,et al.  High-resolution genome-wide mapping of genetic alterations in human glial brain tumors. , 2005, Cancer research.

[23]  Ingrid K. Glad,et al.  CGH-Explorer: a program for analysis of array-CGH data , 2005, Bioinform..

[24]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[25]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[26]  Ramón Díaz-Uriarte,et al.  Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH , 2007, PLoS Comput. Biol..

[27]  Howard L. McLeod,et al.  wuHMM: a robust algorithm to detect DNA copy number variation using long oligonucleotide microarray data , 2008, Nucleic acids research.

[28]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[29]  Yonina C. Eldar,et al.  A fast and flexible method for the segmentation of aCGH data , 2008, ECCB.

[30]  Alexander Eckehart Urban,et al.  High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Sebat,et al.  Application of ROMA (representational oligonucleotide microarray analysis) to patients with cytogenetic rearrangements , 2005, Genetics in Medicine.

[32]  M. Fedurco,et al.  BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies , 2006, Nucleic acids research.

[33]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[34]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[35]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[36]  Sun-Yuan Kung,et al.  Accurate detection of aneuploidies in array CGH and gene expression microarray data , 2004, Bioinform..