On the frequency of copy number variants

MOTIVATION Estimating the frequency distribution of copy number variants (CNVs) is an important aspect of the effort to characterize this new type of genetic variation. Currently, most studies report a strong skew toward low-frequency CNVs. In this article, our goal is to investigate the frequencies of CNVs. We employ a two-step procedure for the CNV frequency estimation process. We use family information a posteriori to select only the most reliable CNV regions, i.e. those showing high rates of Mendelian transmission. RESULTS Our results suggest that the current skew toward low-frequency CNVs may not be representative of the true frequency distribution, but may be due, among other reasons, to the non-negligible false negative rates that characterize CNV detection methods. Moreover, false positives are also likely, as low-frequency CNVs are hard to detect with small sample sizes and technologies that are not ideally suited for their detection. Without appropriate validation methods, such as incorporation of biologically relevant information (for example, in our case, the transmission of heritable CNVs from parents to offspring), it is difficult to assess the validity of specific CNVs, and even harder to obtain reliable frequency estimates.

[1]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[2]  Reiner Siebert,et al.  Breakpoint cloning and haplotype analysis indicate a single origin of the common Inv(10)(p11.2q21.2) mutation among northern Europeans. , 2006, American journal of human genetics.

[3]  Heather Eliassen,et al.  The Childhood Asthma Management Program (CAMP): design, rationale, and methods. Childhood Asthma Management Program Research Group. , 1999, Controlled clinical trials.

[4]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[5]  Philippe Froguel,et al.  Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. , 2007, Human molecular genetics.

[6]  Zachary A. Szpiech,et al.  Genotype, haplotype and copy-number variation in worldwide human populations , 2008, Nature.

[7]  Gary D. Bader,et al.  Germ-line DNA copy number variation frequencies in a large North American population , 2007, Human Genetics.

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[9]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[10]  D. Pinto,et al.  Copy-number variation in control population cohorts. , 2007, Human molecular genetics.

[11]  N Franklin Adkinson,et al.  Long-term effects of budesonide or nedocromil in children with asthma. , 2000, The New England journal of medicine.

[12]  K. Gunderson,et al.  High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. , 2006, Genome research.

[13]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[14]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[15]  Simon Smyth,et al.  Diabetes and obesity: the twin epidemics , 2006, Nature Medicine.

[16]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[17]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[18]  Tomas W. Fitzgerald,et al.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization , 2007, Genome Biology.

[19]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[20]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[21]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[22]  K. Frazer,et al.  Common deletions and SNPs are in linkage disequilibrium in the human genome , 2006, Nature Genetics.

[23]  H. Ostrer,et al.  A versatile statistical analysis algorithm to detect genome copy number variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.