MSMAD: a computationally efficient method for the analysis of noisy array CGH data

MOTIVATION Genome analysis has become one of the most important tools for understanding the complex process of cancerogenesis. With increasing resolution of CGH arrays, the demand for computationally efficient algorithms arises, which are effective in the detection of aberrations even in very noisy data. RESULTS We developed a rather simple, non-parametric technique of high computational efficiency for CGH array analysis that adopts a median absolute deviation concept for breakpoint detection, comprising median smoothing for pre-processing. The resulting algorithm has the potential to outperform any single smoothing approach as well as several recently proposed segmentation techniques. We show its performance through the application of simulated and real datasets in comparison to three other methods for array CGH analysis. IMPLEMENTATION Our approach is implemented in the R-language and environment for statistical computing (version 2.6.1 for Windows, R-project, 2007). The code is available at: http://www.iba.muni.cz/~budinska/msmad.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  J. Daudin,et al.  A Segmentation/Clustering Model for the Analysis of Array CGH Data , 2007, Biometrics.

[2]  Weida Tong,et al.  Bioinformatics approaches for cross-species liver cancer analysis based on microarray gene expression profiling , 2005, BMC Bioinformatics.

[3]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jan Komorowski,et al.  A segmental maximum a posteriori approach to genome-wide copy number profiling , 2008, Bioinform..

[5]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[6]  M. G. Schimek A Roughness Penalty Regression Approach for Statistical Graphics , 1988 .

[7]  Jane Fridlyand,et al.  High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. , 2004, Carcinogenesis.

[8]  Ji Zhu,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm364 Data and text mining Analysis of array CGH data for cancer studies using , 2022 .

[9]  Yi Li,et al.  Bayesian Hidden Markov Modeling of Array CGH Data , 2008, Journal of the American Statistical Association.

[10]  Johan Staaf,et al.  Continuous-index hidden Markov modelling of array CGH copy number data , 2007, Bioinform..

[11]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[12]  H. Klamová,et al.  Complex chromosomal rearrangements in patients with chronic myeloid leukemia. , 2006, Cancer genetics and cytogenetics.

[13]  E. Ziegel,et al.  Proceedings in Computational Statistics , 1998 .

[14]  H. Döhner,et al.  Matrix‐based comparative genomic hybridization: Biochips to screen for genomic imbalances , 1997, Genes, chromosomes & cancer.

[15]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[16]  Tait D. Shanafelt,et al.  Loss of TP53 is due to rearrangements involving chromosome region 17p10∼p12 in chronic lymphocytic leukemia , 2006 .

[17]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[18]  Kevin P. Murphy,et al.  Integrating copy number polymorphisms into array CGH analysis using a robust HMM , 2006, ISMB.

[19]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[20]  Rolf Jaggi,et al.  A high-resolution allelotype of B-cell chronic lymphocytic leukemia (B-CLL). , 2002, Blood.

[21]  Jens Timmer,et al.  Using High-density Snp Arrays Genome-wide Analysis of Dna Copy Number Changes and Loh in Cll , 2022 .

[22]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[23]  Daniel L Van Dyke,et al.  Loss of TP53 is due to rearrangements involving chromosome region 17p10 approximately p12 in chronic lymphocytic leukemia. , 2006, Cancer genetics and cytogenetics.

[24]  L Cawkwell,et al.  Chromosomal analysis of non-small-cell lung cancer by multicolour fluorescent in situ hybridisation , 2004, British Journal of Cancer.

[25]  S. Lam,et al.  High resolution analysis of non‐small cell lung cancer cell lines by whole genome tiling path array CGH , 2006, International journal of cancer.

[26]  Gunnar Wrobel,et al.  Automated array-based genomic profiling in chronic lymphocytic leukemia: development of a clinical tool and discovery of recurrent genomic alterations. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[28]  Johan Staaf,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm359 Data and text mining , 2022 .

[29]  Simon Tavaré,et al.  BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data , 2006, Bioinform..

[30]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[31]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[32]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[33]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[34]  Elena Marchiori,et al.  Chromosomal Breakpoint Detection in Human Cancer , 2003, EvoWorkshops.

[35]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[36]  Edmund Taylor Whittaker On a New Method of Graduation , 1922, Proceedings of the Edinburgh Mathematical Society.

[37]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .