A fast and flexible method for the segmentation of aCGH data

MOTIVATION Array Comparative Genomic Hybridization (aCGH) is used to scan the entire genome for variations in DNA copy number. A central task in the analysis of aCGH data is the segmentation into groups of probes sharing the same DNA copy number. Some well known segmentation methods suffer from very long running times, preventing interactive data analysis. RESULTS We suggest a new segmentation method based on wavelet decomposition and thresholding, which detects significant breakpoints in the data. Our algorithm is over 1000 times faster than leading approaches, with similar performance. Another key advantage of the proposed method is its simplicity and flexibility. Due to its intuitive structure, it can be easily generalized to incorporate several types of side information. Here, we consider two extensions which include side information indicating the reliability of each measurement, and compensating for a changing variability in the measurement noise. The resulting algorithm outperforms existing methods, both in terms of speed and performance, when applied on real high density CGH data. AVAILABILITY Implementation is available under software tab at: http://www.ee.technion.ac.il/Sites/People/YoninaEldar/.

[1]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[2]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[3]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[4]  Yonatan Aumann,et al.  Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis , 2005, RECOMB.

[5]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[6]  L. Recht,et al.  High-resolution genome-wide mapping of genetic alterations in human glial brain tumors. , 2005, Cancer research.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[9]  D. Donoho,et al.  Redundant Multiscale Transforms and Their Application for Morphological Component Separation , 2004 .

[10]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[11]  Martin Vingron,et al.  Normalization and quantification of differential expression in gene expression microarrays , 2006, Briefings Bioinform..

[12]  Tomas W. Fitzgerald,et al.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization , 2007, Genome Biology.

[13]  Robert Kincaid,et al.  Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[15]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[16]  Simon Smyth,et al.  Diabetes and obesity: the twin epidemics , 2006, Nature Medicine.

[17]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[18]  Joe W. Gray,et al.  Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas , 2001, Nature Genetics.

[19]  S. Mallat A wavelet tour of signal processing , 1998 .

[20]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[21]  Philippe Froguel,et al.  Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. , 2007, Human molecular genetics.

[22]  Y. Benjamini,et al.  Adaptive thresholding of wavelet coefficients , 1996 .

[23]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[24]  Wolfgang Huber,et al.  Transcript mapping with high-density oligonucleotide tiling arrays , 2006, Bioinform..