A Bayesian segmentation approach to ascertain copy number variations at the population level

MOTIVATION Efficient and accurate ascertainment of copy number variations (CNVs) at the population level is essential to understand the evolutionary process and population genetics, and to apply CNVs in population-based genome-wide association studies for complex human diseases. We propose a novel Bayesian segmentation approach to identify CNVs in a defined population of any size. It is computationally efficient and provides statistical evidence for the detected CNVs through the Bayes factor. This approach has the unique feature of carrying out segmentation and assigning copy number status simultaneously-a desirable property that current segmentation methods do not share. RESULTS In comparisons with popular two-step segmentation methods for a single individual using benchmark simulation studies, we find the new approach to perform competitively with respect to false discovery rate and sensitivity in breakpoint detection. In a simulation study of multiple samples with recurrent copy numbers, the new approach outperforms two leading single sample methods. We further demonstrate the effectiveness of our approach in population-level analysis of previously published HapMap data. We also apply our approach in studying population genetics of CNVs. AVAILABILITY R programs are available at http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML

[1]  Kevin P. Murphy,et al.  Integrating copy number polymorphisms into array CGH analysis using a robust HMM , 2006, ISMB.

[2]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[3]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[4]  Tomas W. Fitzgerald,et al.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization , 2007, Genome Biology.

[5]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[6]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[7]  Peter Kraft,et al.  Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques , 2005, Genetic epidemiology.

[8]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.

[9]  M. Reinders,et al.  Identification of cancer genes using a statistical framework for multi-experiment analysis of non-discretized array CGH data , 2008, Nucleic Acids Research.

[10]  Jeroen de Ridder,et al.  Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data , 2008, Nucleic acids research.

[11]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[12]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[13]  Kevin P. Murphy,et al.  Modeling recurrent DNA copy number alterations in array CGH data , 2007, ISMB/ECCB.

[14]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[15]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[16]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[17]  Sanjay Ranka,et al.  Markers improve clustering of CGH data , 2007, Bioinform..

[18]  W. G. Hill,et al.  Genetic Data Analysis II . By Bruce S. Weir, Sunderland, Massachusetts. Sinauer Associates, Inc.445 pages. ISBN 0-87893-902-4. , 1996 .

[19]  Nigel P. Carter,et al.  Accurate and reliable high-throughput detection of copy number variation in the human genome. , 2006, Genome research.

[20]  Sylvia Richardson,et al.  Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model , 2006, Bioinform..

[21]  Alexander Eckehart Urban,et al.  in the human genome Systematic prediction and validation of breakpoints associated with copy-number variants , 2007 .

[22]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[23]  Ramón Díaz-Uriarte,et al.  Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH , 2007, PLoS Comput. Biol..

[24]  Howard L. McLeod,et al.  wuHMM: a robust algorithm to detect DNA copy number variation using long oligonucleotide microarray data , 2008, Nucleic acids research.

[25]  Sharon J. Diskin,et al.  Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms , 2008, Nucleic acids research.

[26]  T. Seike,et al.  CD38 is critical for social behaviour by regulating oxytocin secretion , 2007, Nature.

[27]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[28]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[29]  Matthew E Hurles,et al.  The population genetics of structural variation , 2007, Nature Genetics.

[30]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[31]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[32]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[33]  Xavier Estivill,et al.  Disorders: Filling the Gaps and Exploring Complexity in Genome-Wide Association Studies , 2022 .

[34]  S. Tavaré,et al.  Hidden copy number variation in the HapMap population , 2008, Proceedings of the National Academy of Sciences.

[35]  D. Cavalieri,et al.  Fundamentals of cDNA microarray data analysis. , 2003, Trends in genetics : TIG.

[36]  Céline Rouveirol,et al.  Bioinformatics Original Paper Computation of Recurrent Minimal Genomic Alterations from Array-cgh Data , 2022 .

[37]  Antonio Ortega,et al.  Sparse representation and Bayesian detection of genome copy number alterations from microarray data , 2008, Bioinform..

[38]  H. Ostrer,et al.  A versatile statistical analysis algorithm to detect genome copy number variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  N. Copeland,et al.  Novel human and mouse annexin A10 are linked to the genome duplications during early chordate evolution. , 1999, Genomics.