Population-genetic properties of differentiated human copy-number polymorphisms.

Copy-number variants (CNVs) can reach appreciable frequencies in the human population, and recent discoveries have shown that several of these copy-number polymorphisms (CNPs) are associated with human diseases, including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. We developed a method based on single-channel intensity data and benchmarked against copy numbers determined from sequencing read depth to successfully obtain CNP genotypes for 1495 CNPs from 487 human DNA samples of diverse ethnic backgrounds. This microarray contained CNPs in segmental duplication-rich regions and insertions of sequences not represented in the reference genome assembly or on standard SNP microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that biallelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although biallelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multicopy CNPs do not (40% with r > 0.8). We selected a subset of CNPs for further characterization in 1876 additional samples from 62 populations; this revealed striking population-differentiated structural variants in genes of clinical significance such as OCLN, a tight junction protein involved in hepatitis C viral entry. Our microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that cannot be imputed from SNP genotypes) might have contributed disproportionately to human diversity and selection.

[1]  Yehuda Ben-Shahar,et al.  Motile Cilia of Human Airway Epithelia Are Chemosensory , 2009, Science.

[2]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[3]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[4]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[5]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[6]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[7]  Stephen R Quake,et al.  Whole-genome molecular haplotyping of single cells , 2011, Nature Biotechnology.

[8]  E. Eichler,et al.  Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions , 2010, Nature Methods.

[9]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[10]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[11]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[12]  Bruce S. Weir Genetic Data Analysis , 1990 .

[13]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[14]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[15]  S. Liggett,et al.  Bitter taste receptors on airway smooth muscle bronchodilate by a localized calcium flux and reverse obstruction , 2010, Nature Medicine.

[16]  Charles M. Rice,et al.  Human occludin is a hepatitis C virus entry factor required for infection of mouse cells , 2009, Nature.

[17]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[18]  Daniel R. Schrider,et al.  Lower linkage disequilibrium at CNVs is due to both recurrent mutation and transposing duplications. , 2010, Molecular biology and evolution.

[19]  N. Carter,et al.  Germline rates of de novo meiotic deletions and duplications causing several genomic disorders , 2008, Nature Genetics.

[20]  Yan Guo,et al.  Genome-wide copy-number-variation study identified a susceptibility gene, UGT2B17, for osteoporosis. , 2008, American journal of human genetics.

[21]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[22]  E. Eichler,et al.  Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. , 2006, American journal of human genetics.

[23]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[24]  Deborah A Nickerson,et al.  De novo rates and selection of large copy number variation. , 2010, Genome research.

[25]  N. Rosenberg,et al.  Standardized Subsets of the HGDP‐CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives , 2006, Annals of human genetics.

[26]  Xavier Estivill,et al.  Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. , 2002, Human molecular genetics.

[27]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[28]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[29]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[30]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010 .

[31]  Alfons Meindl,et al.  Copy number variant in the candidate tumor suppressor gene MTUS1 and familial breast cancer risk. , 2007, Carcinogenesis.

[32]  B. Weir Genetic Data Analysis II. , 1997 .

[33]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[34]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[35]  Richard M Myers,et al.  Population analysis of large copy number variants and hotspots of human genetic disease. , 2009, American journal of human genetics.

[36]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[37]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[38]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[39]  J. Lupski,et al.  Genomic rearrangements and sporadic disease , 2007, Nature Genetics.

[40]  E. Eichler,et al.  Systematic assessment of copy number variant detection via genome-wide SNP genotyping , 2008, Nature Genetics.

[41]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[42]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[43]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[44]  J. Lupski,et al.  Mechanisms of change in gene copy number , 2009, Nature Reviews Genetics.

[45]  K. Frazer,et al.  Common deletions and SNPs are in linkage disequilibrium in the human genome , 2006, Nature Genetics.

[46]  Lars Bolund,et al.  Building the sequence map of the human pan-genome , 2010, Nature Biotechnology.

[47]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[48]  B. Rovin,et al.  The Influence of CCL 3 L 1 Gene – Containing Segmental Duplications on HIV-1 / AIDS Susceptibility , 2009 .

[49]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[50]  T. Richmond,et al.  Analysis of chromosome breakpoints in neuroblastoma at sub‐kilobase resolution using fine‐tiling oligonucleotide array CGH , 2005, Genes, chromosomes & cancer.

[51]  Philippe Froguel,et al.  FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity , 2007, Nature Genetics.

[52]  P. Stankiewicz,et al.  Genome architecture, rearrangements and genomic disorders. , 2002, Trends in genetics : TIG.

[53]  E. Eichler,et al.  Population Stratification of a Common APOBEC Gene Deletion Polymorphism , 2007, PLoS genetics.

[54]  André Reis,et al.  Psoriasis is associated with increased β-defensin genomic copy number , 2008, Nature Genetics.

[55]  E. Eichler,et al.  Human copy number polymorphic genes , 2009, Cytogenetic and Genome Research.

[56]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[57]  Ira M. Hall,et al.  Recurrent DNA copy number variation in the laboratory mouse , 2007, Nature Genetics.

[58]  Ryan E. Mills,et al.  Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing , 2010, Nature Genetics.

[59]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.