Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data

MOTIVATION Array Comparative Genomic Hybridization (CGH) can reveal chromosomal aberrations in the genomic DNA. These amplifications and deletions at the DNA level are important in the pathogenesis of cancer and other diseases. While a large number of approaches have been proposed for analyzing the large array CGH datasets, the relative merits of these methods in practice are not clear. RESULTS We compare 11 different algorithms for analyzing array CGH data. These include both segment detection methods and smoothing methods, based on diverse techniques such as mixture models, Hidden Markov Models, maximum likelihood, regression, wavelets and genetic algorithms. We compute the Receiver Operating Characteristic (ROC) curves using simulated data to quantify sensitivity and specificity for various levels of signal-to-noise ratio and different sizes of abnormalities. We also characterize their performance on chromosomal regions of interest in a real dataset obtained from patients with Glioblastoma Multiforme. While comparisons of this type are difficult due to possibly sub-optimal choice of parameters in the methods, they nevertheless reveal general characteristics that are helpful to the biological investigator.

[1]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[2]  Elena Marchiori,et al.  Chromosomal Breakpoint Detection in Human Cancer , 2003, EvoWorkshops.

[3]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[4]  Ash A. Alizadeh,et al.  Genome-wide analysis of DNA copy-number changes using cDNA microarrays , 1999, Nature Genetics.

[5]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Polzehl,et al.  Adaptive weights smoothing with applications to image restoration , 1998 .

[7]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[8]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[9]  L. Recht,et al.  High-resolution genome-wide mapping of genetic alterations in human glial brain tumors. , 2005, Cancer research.

[10]  Jaakko Astola,et al.  CGH-Plotter: MATLAB toolbox for CGH-data analysis , 2003, Bioinform..

[11]  G. Reifenberger,et al.  Molecular genetic analysis of the TP53, PTEN, CDKN2A, EGFR, CDK4 and MDM2 tumour‐associated genes in supratentorial primitive neuroectodermal tumours and glioblastomas of childhood , 2002, Neuropathology and applied neurobiology.

[12]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[13]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[14]  J. Sebat,et al.  Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. , 2003, Genome research.

[15]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[16]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[17]  Bradley P. Coe,et al.  A tiling resolution DNA microarray with complete coverage of the human genome , 2004, Nature Genetics.

[18]  M. Srivastava,et al.  On Tests for Detecting Change in Mean , 1975 .

[19]  Ingrid K. Glad,et al.  CGH-Explorer: a program for analysis of array-CGH data , 2005, Bioinform..

[20]  Thomas Koschny,et al.  Comparative genomic hybridization in glioma: a meta-analysis of 509 cases. , 2002, Cancer genetics and cytogenetics.

[21]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[22]  Jane Fridlyand,et al.  Shaping of tumor and drug-resistant genomes by instability and selection , 2003, Oncogene.

[23]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[24]  Sun-Yuan Kung,et al.  Accurate detection of aneuploidies in array CGH and gene expression microarray data , 2004, Bioinform..

[25]  Joe W. Gray,et al.  Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas , 2001, Nature Genetics.

[26]  J. Squire,et al.  Chromosomal localization of DNA amplifications in neuroblastoma tumors using cDNA microarray comparative genomic hybridization. , 2003, Neoplasia.

[27]  H. Ostrer,et al.  A versatile statistical analysis algorithm to detect genome copy number variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Schimenti,et al.  Synapsis or silence , 2005, Nature Genetics.

[29]  H. Döhner,et al.  Matrix‐based comparative genomic hybridization: Biochips to screen for genomic imbalances , 1997, Genes, chromosomes & cancer.

[30]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[31]  L. Chin,et al.  High-Resolution Global Profiling of Genomic Alterations with Long Oligonucleotide Microarray , 2004, Cancer Research.

[32]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[33]  Elena Marchiori,et al.  Breakpoint identification and smoothing of array comparative genomic hybridization data , 2004, Bioinform..