CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss

An algorithm to reduce multi-sample array CGH data from thousands of clones to tens or hundreds of clone regions is introduced. This reduction of the data is performed such that little information is lost, which is possible due to the high dependencies between neighboring clones. The algorithm is explained using a small example. The potential beneficial effects of the algorithm for downstream analysis are illustrated by re-analysis of previously published colorectal cancer data. Using multiple testing corrections suitable for these data, we provide statistical evidence for genomic differences on several clone regions between MSI+ and CIN+ tumors. The algorithm, named CGHregions, is available as an easy-to-use script in R.

[1]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[2]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[3]  Peter B. Gilbert,et al.  A modified false discovery rate multiple‐comparisons procedure for discrete data, applied to human immunodeficiency virus genetics , 2005 .

[4]  Jeroen Beliën,et al.  ACE-it: a tool for genome-wide integration of gene dosage and RNA expression data , 2006, Bioinform..

[5]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[6]  Wessel N. van Wieringen,et al.  CGHcall: Calling aberrations for array CGH tumor profiles. , 2008 .

[7]  P J F Snijders,et al.  Genome-wide DNA copy number alterations in head and neck squamous cell carcinomas with or without oncogene-expressing human papillomavirus , 2006, Oncogene.

[8]  C J L M Meijer,et al.  Increased gene copy numbers at chromosome 20q are frequent in both squamous cell carcinomas and adenocarcinomas of the cervix , 2006, The Journal of pathology.

[9]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[10]  Ruud H. Brakenhoff,et al.  CGHMultiArray: exact P-values for multi-array comparative genomic hybridization data , 2005, Bioinform..

[11]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[12]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[13]  Céline Rouveirol,et al.  Bioinformatics Original Paper Computation of Recurrent Minimal Genomic Alterations from Array-cgh Data , 2022 .

[14]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[15]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[16]  N. Carter,et al.  Array Comparative Genomic Hybridization Analysis of Colorectal Cancer Cell Lines and Primary Carcinomas , 2004, Cancer Research.