Estimating Genome-Wide Copy Number Using Allele-Specific Mixture Models

Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30-kb resolution, which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively, thus greatly reducing resolution. Recently, regression-type models that account for probe effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314-sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele-specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (www.bioconductor.org).

[1]  L. Brooks,et al.  A DNA polymorphism discovery resource for research on human genetic variation. , 1998, Genome research.

[2]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[4]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[5]  S. P. Fodor,et al.  Large-scale genotyping of complex DNA , 2003, Nature Biotechnology.

[6]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[7]  Luc Girard,et al.  An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. , 2004, Cancer research.

[8]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[9]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[10]  M. Shapero,et al.  High-resolution analysis of DNA copy number using oligonucleotide microarrays. , 2004, Genome research.

[11]  Keith W. Jones,et al.  Whole genome DNA copy number changes identified by high density oligonucleotide arrays , 2004, Human Genomics.

[12]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[13]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[14]  Hiroyuki Aburatani,et al.  Allelic dosage analysis with genotyping microarrays. , 2005, Biochemical and biophysical research communications.

[15]  Jing Huang,et al.  CARAT: A novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays , 2006, BMC Bioinformatics.

[16]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[17]  Shigeru Chiba,et al.  A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. , 2005, Cancer research.

[18]  Andrew J Sharp,et al.  Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome , 2006, Nature Genetics.

[19]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[20]  K. Frazer,et al.  Common deletions and SNPs are in linkage disequilibrium in the human genome , 2006, Nature Genetics.

[21]  K. Gunderson,et al.  High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. , 2006, Genome research.

[22]  Jing Huang,et al.  Noise Reduction from Genotyping Microarrays Using Probe Level Information , 2006, Silico Biol..

[23]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[24]  Terence P. Speed,et al.  Genome analysis A genotype calling algorithm for affymetrix SNP arrays , 2005 .

[25]  David Harrington,et al.  PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. , 2007, Biostatistics.

[26]  N. Carter,et al.  Ultra-high resolution array painting facilitates breakpoint sequencing , 2006, Journal of Medical Genetics.

[27]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[28]  Yong-shu He,et al.  [Structural variation in the human genome]. , 2009, Yi chuan = Hereditas.