A flexible rank-based framework for detecting copy number aberrations from array data

MOTIVATION DNA copy number aberration--both inherited and sporadic--is a significant contributor to a variety of human diseases. Copy number characterization is therefore an area of intense research. Probe hybridization-based arrays are important tools used to measure copy number in a high-throughput manner. RESULTS In this article, we present a simple but powerful nonparametric rank-based approach to detect deletions and gains from raw array copy number measurements. We use three different rank-based statistics to detect three separate molecular phenomena-somatic lesions, germline deletions and germline gains. The approach is robust and rigorously grounded in statistical theory, thereby enabling the meaningful assignment of statistical significance to each putative aberration. We demonstrate the flexibility of our approach by applying it to data from three different array platforms. We show that our method compares favorably with established approaches by applying it to published well-characterized samples. Power simulations demonstrate exquisite sensitivity for array data of reasonable quality. CONCLUSIONS Our flexible rank-based framework is suitable for multiple platforms including single nucleotide polymorphism arrays and array comparative genomic hybridization, and can reliably detect gains or losses of genomic DNA, whether inherited, de novo, or somatic. AVAILABILITY An R package RankCopy containing the methods described here, and is freely available from the author's web site (http://mendel.gene.cwru.edu/laframboiselab/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  K. Gunderson,et al.  High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. , 2006, Genome research.

[2]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[5]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[6]  Iuliana Ionita-Laza,et al.  On the analysis of copy‐number variations in genome‐wide association studies: a translation of the family‐based association test , 2008, Genetic epidemiology.

[7]  Luc Girard,et al.  An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. , 2004, Cancer research.

[8]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[9]  S. P. Fodor,et al.  Large-scale genotyping of complex DNA , 2003, Nature Biotechnology.

[10]  Hartmut Goldschmidt,et al.  A new method for class prediction based on signed-rank algorithms applied to Affymetrix® microarray experiments , 2008, BMC Bioinformatics.

[11]  Kevin Struhl,et al.  Rank-statistics based enrichment-site prediction algorithm developed for chromatin immunoprecipitation on chip experiments , 2006, BMC Bioinformatics.

[12]  M. Meyerson,et al.  Homozygous deletions and chromosome amplifications in human lung carcinomas revealed by single nucleotide polymorphism array analysis. , 2005, Cancer research.

[13]  Fikret Erdogan,et al.  Comparative genome hybridization suggests a role for NRXN1 and APBA2 in schizophrenia. , 2007, Human molecular genetics.

[14]  M. Kendall Rank Correlation Methods , 1949 .

[15]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[16]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[17]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[18]  S. Mccarroll,et al.  Copy-number variation and association studies of human disease , 2007, Nature Genetics.

[19]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[21]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[22]  Weihua Chang,et al.  Whole-genome genotyping with the single-base extension assay , 2005, Nature Methods.

[23]  Wei Chen,et al.  CGHPRO – A comprehensive data analysis tool for array CGH , 2005, BMC Bioinformatics.

[24]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[25]  David Harrington,et al.  PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. , 2007, Biostatistics.

[26]  D. Campion,et al.  APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy , 2006, Nature Genetics.

[27]  André Reis,et al.  Psoriasis is associated with increased beta-defensin genomic copy number. , 2008, Nature genetics.

[28]  D. Pinto,et al.  Structural variation of chromosomes in autism spectrum disorder. , 2008, American journal of human genetics.

[29]  J. Stockman Association between Microdeletion and Microduplication at 16p11.2 and Autism , 2009 .

[30]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[31]  Jing Huang,et al.  CARAT: A novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays , 2006, BMC Bioinformatics.

[32]  D. Cutler,et al.  Simultaneous discovery and testing of deletions for disease association in SNP genotyping studies. , 2007, American journal of human genetics.

[33]  Bernhard Radlwimmer,et al.  A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. , 2006, American journal of human genetics.

[34]  Simon Smyth,et al.  Diabetes and obesity: the twin epidemics , 2006, Nature Medicine.

[35]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[36]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[37]  K. Frazer,et al.  Common deletions and SNPs are in linkage disequilibrium in the human genome , 2006, Nature Genetics.

[38]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[39]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.