Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA

MOTIVATION The complexity of a large number of recently discovered copy number polymorphisms is much higher than initially thought, thus making it more difficult to detect them in the presence of significant measurement noise. In this scenario, separate normalization and segmentation is prone to lead to many false detections of changes in copy number. New approaches capable of jointly modeling the copy number and the non-copy number (noise) hybridization effects across multiple samples will potentially lead to more accurate results. METHODS In this article, the genome alteration detection analysis (GADA) approach introduced in our previous work is extended to a multiple sample model. The copy number component is independent for each sample and uses a sparse Bayesian prior, while the reference hybridization level is not necessarily sparse but identical on all samples. The expectation maximization (EM) algorithm used to fit the model iteratively determines whether the observed hybridization levels are more likely due to a copy number variation or to a shared hybridization bias. RESULTS The new proposed approach is compared with the currently used strategy of separate normalization followed by independent segmentation of each array. Real microarray data obtained from HapMap samples are randomly partitioned to create different reference sets. Using the new approach, copy number and reference intensity estimates are significantly less variable if the reference set changes; and a higher consistency on copy numbers detected within HapMap family trios is obtained. Finally, the running time to fit the model grows linearly in the number samples and probes. AVAILABILITY http://biron.usc.edu/~piquereg/GADA.

[1]  Antonio Ortega,et al.  Wavelet Footprints and Sparse Bayesian Learning for DNA Copy Number Change Analysis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[3]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[4]  Tomas W. Fitzgerald,et al.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization , 2007, Genome Biology.

[5]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[6]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[7]  Keith W. Jones,et al.  Whole genome DNA copy number changes identified by high density oligonucleotide arrays , 2004, Human Genomics.

[8]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[9]  Antonio Ortega,et al.  Sparse representation and Bayesian detection of genome copy number alterations from microarray data , 2008, Bioinform..

[10]  河村 大輔,et al.  Genome-wide detection of human copy number variations using high density DNA oligonucleotide arrays , 2007 .

[11]  Luc Girard,et al.  An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. , 2004, Cancer research.

[12]  Emmanuel Barillot,et al.  ITALICS: an algorithm for normalization and DNA copy number calling for Affymetrix SNP arrays , 2008, Bioinform..

[13]  C. Li,et al.  Feature extraction and normalization algorithms for high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry. Supplement.

[14]  K. Gunderson,et al.  High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. , 2006, Genome research.

[15]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[16]  Anthony J Brookes,et al.  Complex SNP-related sequence variation in segmental genome duplications , 2004, Nature Genetics.

[17]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[18]  Sharon J. Diskin,et al.  Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms , 2008, Nucleic acids research.

[19]  Terence P. Speed,et al.  Estimation and assessment of raw copy numbers at the single locus level , 2008, Bioinform..

[20]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[21]  Shigeru Chiba,et al.  A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. , 2005, Cancer research.

[22]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[23]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[24]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[25]  A. Ortega,et al.  Bayesian detection of recurrent copy number alterations across multiple array samples , 2008, 2008 IEEE International Workshop on Genomic Signal Processing and Statistics.

[26]  OrtegaAntonio,et al.  Sparse representation and Bayesian detection of genome copy number alterations from microarray data , 2008 .

[27]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[28]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.