Exploiting noise in array CGH data to improve detection of DNA copy number change

Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected.

[1]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[2]  Joe W. Gray,et al.  Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas , 2001, Nature Genetics.

[3]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[4]  Sun-Yuan Kung,et al.  Accurate detection of aneuploidies in array CGH and gene expression microarray data , 2004, Bioinform..

[5]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[6]  J. Squire,et al.  Chromosomal localization of DNA amplifications in neuroblastoma tumors using cDNA microarray comparative genomic hybridization. , 2003, Neoplasia.

[7]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[8]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[9]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[10]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[11]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[12]  V. Roychowdhury,et al.  Assessment of long-range correlation in time series: how to avoid pitfalls. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[14]  Bradley P. Coe,et al.  A tiling resolution DNA microarray with complete coverage of the human genome , 2004, Nature Genetics.

[15]  H. Döhner,et al.  Matrix‐based comparative genomic hybridization: Biochips to screen for genomic imbalances , 1997, Genes, chromosomes & cancer.

[16]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[17]  Elena Marchiori,et al.  Breakpoint identification and smoothing of array comparative genomic hybridization data , 2004, Bioinform..

[18]  Ingrid K. Glad,et al.  CGH-Explorer: a program for analysis of array-CGH data , 2005, Bioinform..

[19]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[21]  Jane Fridlyand,et al.  Shaping of tumor and drug-resistant genomes by instability and selection , 2003, Oncogene.

[22]  C. Peng,et al.  Mosaic organization of DNA nucleotides. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[23]  H. Ostrer,et al.  A versatile statistical analysis algorithm to detect genome copy number variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Jianbo Gao,et al.  Inertia and memory in ambiguous visual perception , 2006, Cognitive Processing.

[25]  Jaakko Astola,et al.  CGH-Plotter: MATLAB toolbox for CGH-data analysis , 2003, Bioinform..

[26]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[27]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[28]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[29]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[30]  A. Barzilai,et al.  Impaired genomic stability and increased oxidative stress exacerbate different features of Ataxia-telangiectasia. , 2005, Human molecular genetics.