A statistical approach for array CGH data analysis

BackgroundMicroarray-CGH experiments are used to detect and map chromosomal imbalances, by hybridizing targets of genomic DNA from a test and a reference sample to sequences immobilized on a slide. These probes are genomic DNA sequences (BACs) that are mapped on the genome. The signal has a spatial coherence that can be handled by specific statistical tools. Segmentation methods seem to be a natural framework for this purpose. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose BACs share the same relative copy number on average. We model a CGH profile by a random Gaussian process whose distribution parameters are affected by abrupt changes at unknown coordinates. Two major problems arise : to determine which parameters are affected by the abrupt changes (the mean and the variance, or the mean only), and the selection of the number of segments in the profile.ResultsWe demonstrate that existing methods for estimating the number of segments are not well adapted in the case of array CGH data, and we propose an adaptive criterion that detects previously mapped chromosomal aberrations. The performances of this method are discussed based on simulations and publicly available data sets. Then we discuss the choice of modeling for array CGH data and show that the model with a homogeneous variance is adapted to this context.ConclusionsArray CGH data analysis is an emerging field that needs appropriate statistical tools. Process segmentation and model selection provide a theoretical framework that allows precise biological interpretations. Adaptive methods for model selection give promising results concerning the estimation of the number of altered regions on the genome.

[1]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[2]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[3]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[4]  Jaakko Astola,et al.  CGH-Plotter: MATLAB toolbox for CGH-data analysis , 2003, Bioinform..

[5]  Bradley P. Coe,et al.  A tiling resolution DNA microarray with complete coverage of the human genome , 2004, Nature Genetics.

[6]  Marc Lavielle,et al.  Using penalized contrasts for the change-point problem , 2005, Signal Process..

[7]  D. Albertson,et al.  Chromosome aberrations in solid tumors , 2003, Nature Genetics.

[8]  I E Auger,et al.  Algorithms for the optimal identification of segment neighborhoods. , 1989, Bulletin of mathematical biology.

[9]  Elena Marchiori,et al.  Applications of Evolutionary Computing: Evoworkshops 2003 , 2003 .

[10]  Philippe Froguel,et al.  Intracellular retention is a common characteristic of childhood obesity-associated MC4R mutations. , 2003, Human molecular genetics.

[11]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[12]  Marilyn M. Li Molecular Cytogenetics: Protocols and Applications , 2003 .

[13]  H. Döhner,et al.  Matrix‐based comparative genomic hybridization: Biochips to screen for genomic imbalances , 1997, Genes, chromosomes & cancer.

[14]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[15]  Daniel Pinkel,et al.  Genomic microarrays in human genetic disease and cancer. , 2003, Human molecular genetics.

[16]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[17]  Jane Fridlyand,et al.  Erratum: "Hidden Markov models approach to the analysis of array CGH data" (Journal of Multivariate Analysis (2004) vol. 90 (132-153) 10.1016/j.jmva.2004.02.008) , 2005 .

[18]  Jane Fridlyand,et al.  High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. , 2004, Carcinogenesis.

[19]  Emilie Lebarbier,et al.  Detecting multiple change-points in the mean of Gaussian process by model selection , 2005, Signal Process..

[20]  N. Carpenter,et al.  Molecular cytogenetics , 2001, Seminars in pediatric neurology.