Denoising array-based comparative genomic hybridization data using wavelets.

Array-based comparative genomic hybridization (array-CGH) provides a high-throughput, high-resolution method to measure relative changes in DNA copy number simultaneously at thousands of genomic loci. Typically, these measurements are reported and displayed linearly on chromosome maps, and gains and losses are detected as deviations from normal diploid cells. We propose that one may consider denoising the data to uncover the true copy number changes before drawing inferences on the patterns of aberrations in the samples. Nonparametric techniques are particularly suitable for data denoising as they do not impose a parametric model in finding structures in the data. In this paper, we employ wavelets to denoise the data as wavelets have sound theoretical properties and a fast computational algorithm, and are particularly well suited for handling the abrupt changes seen in array-CGH data. A simulation study shows that denoising data prior to testing can achieve greater power in detecting the aberrant spot than using the raw data without denoising. Finally, we illustrate the method on two array-CGH data sets.

[1]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[2]  Han G Brunner,et al.  High-throughput analysis of subtelomeric chromosome rearrangements by use of array-based comparative genomic hybridization. , 2002, American journal of human genetics.

[3]  Elena Marchiori,et al.  Chromosomal Breakpoint Detection in Human Cancer , 2003, EvoWorkshops.

[4]  Joe W. Gray,et al.  Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas , 2001, Nature Genetics.

[5]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[6]  M. Wand Local Regression and Likelihood , 2001 .

[7]  Anestis Antoniadis,et al.  Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study , 2001 .

[8]  M A Newton,et al.  On the statistical analysis of allelic-loss data. , 1998, Statistics in medicine.

[9]  A. Tsybakov,et al.  Wavelets, approximation, and statistical applications , 1998 .

[10]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[11]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[12]  H. Döhner,et al.  Matrix‐based comparative genomic hybridization: Biochips to screen for genomic imbalances , 1997, Genes, chromosomes & cancer.

[13]  M. Newton,et al.  Inferring the Location and Effect of Tumor Suppressor Genes by Instability‐Selection Modeling of Allelic‐Loss Data , 2000, Biometrics.

[14]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[15]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[16]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[17]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[18]  Donald B. Percival,et al.  Wavelet shrinkage for unequally spaced data , 1999, Stat. Comput..

[19]  A. Walden,et al.  Wavelet Methods for Time Series Analysis , 2000 .

[20]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .