CNAnova: a new approach for finding recurrent copy number abnormalities in cancer SNP microarray data

MOTIVATION The current generation of single nucleotide polymorphism (SNP) arrays allows measurement of copy number aberrations (CNAs) in cancer at more than one million locations in the genome in hundreds of tumour samples. Most research has focused on single-sample CNA discovery, the so-called segmentation problem. The availability of high-density, large sample-size SNP array datasets makes the identification of recurrent copy number changes in cancer, an important issue that can be addressed using the cross-sample information. RESULTS We present a novel approach for finding regions of recurrent copy number aberrations, called CNAnova, from Affymetrix SNP 6.0 array data. The method derives its statistical properties from a control dataset composed of normal samples and, in contrast to previous methods, does not require segmentation and permutation steps. For rigorous testing of the algorithm and comparison to existing methods, we developed a simulation scheme that uses the noise distribution present in Affymetrix arrays. Application of the method to 128 acute lymphoblastic leukaemia samples shows that CNAnova achieves lower error rate than a popular alternative approach. We also describe an extension of the CNAnova framework to identify recurrent CNA regions with intra-tumour heterogeneity, present in either primary or relapsed samples from the same patients. AVAILABILITY The CNAnova package and synthetic datasets are available at http://www.compbio.group.cam.ac.uk/software.html.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Tomas W. Fitzgerald,et al.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization , 2007, Genome Biology.

[4]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[5]  S. Shah,et al.  Computational methods for identification of recurrent copy number alteration patterns by array CGH , 2009, Cytogenetic and Genome Research.

[6]  T. Mockler,et al.  Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology , 2008, Nucleic acids research.

[7]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[8]  Anne E Carpenter,et al.  Ultrasome: efficient aberration caller for copy number studies of ultra-high resolution , 2009, Bioinform..

[9]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[10]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.

[11]  Karuturi R. Krishna Murthy,et al.  Bias in the estimation of false discovery rate in microarray studies , 2005, Bioinform..

[12]  A. Montpetit,et al.  Mutational and expression analysis of the chromosome 12p candidate tumor suppressor genes in pre-B acute lymphoblastic leukemia , 2004, Leukemia.

[13]  Mark Gerstein,et al.  MSB: a mean-shift-based approach for the analysis of structural variation in the genome. , 2008, Genome research.

[14]  Terence P. Speed,et al.  Estimation and assessment of raw copy numbers at the single locus level , 2008, Bioinform..

[15]  Joe W. Gray,et al.  Translating insights from the cancer genome into clinical practice , 2008, Nature.

[16]  Sharon J. Diskin,et al.  Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms , 2008, Nucleic acids research.

[17]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[18]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[19]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[20]  Alan Wells,et al.  Modeling of signal-response cascades using decision tree analysis , 2005, Bioinform..

[21]  H. Chung,et al.  Identification of significant regional genetic variations using continuous CNV values in aCGH data. , 2009, Genomics.

[22]  J. Schimenti,et al.  Synapsis or silence , 2005, Nature Genetics.

[23]  T. Helleday,et al.  The ERCC1/XPF endonuclease is required for efficient single-strand annealing and gene conversion in mammalian cells , 2007, Nucleic acids research.

[24]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[25]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[26]  S. Swamy,et al.  PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data , 2009, Biostatistics.

[27]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[28]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[29]  C. Mullighan Genomic analysis of acute leukemia , 2009, International journal of laboratory hematology.

[30]  Derek Y. Chiang,et al.  Characterizing the cancer genome in lung adenocarcinoma , 2007, Nature.