Mapping copy number variation by population scale genome sequencing

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

[1]  J. Lupski Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. , 1998, Trends in genetics : TIG.

[2]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[3]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[4]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[5]  E. Eichler,et al.  Segmental duplications and copy-number variation in the human genome. , 2005, American journal of human genetics.

[6]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[7]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[8]  K. Frazer,et al.  Common deletions and SNPs are in linkage disequilibrium in the human genome , 2006, Nature Genetics.

[9]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[10]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[11]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[12]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[13]  J. Lupski,et al.  A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders , 2007, Cell.

[14]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[15]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[16]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[17]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[18]  Judy H Cho,et al.  Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease , 2008, Nature Genetics.

[19]  Seunghak Lee,et al.  A robust framework for detecting structural variations in a genome , 2008, ISMB.

[20]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[21]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[22]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[23]  Jessica R. Wolff,et al.  Microduplications of 16p11.2 are Associated with Schizophrenia , 2009, Nature Genetics.

[24]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[25]  Süleyman Cenk Sahinalp,et al.  Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes , 2009, RECOMB.

[26]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[27]  J. Lupski,et al.  Mechanisms of change in gene copy number , 2009, Nature Reviews Genetics.

[28]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[29]  Tsviya Olender,et al.  Human olfaction: from genomic variation to phenotypic diversity. , 2009, Trends in genetics : TIG.

[30]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[31]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[32]  Christian Gieger,et al.  Six new loci associated with body mass index highlight a neuronal influence on body weight regulation , 2009, Nature Genetics.

[33]  John Wei,et al.  Towards a comprehensive structural variation map of an individual human genome , 2010, Genome Biology.

[34]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[35]  Hugo Y. K. Lam,et al.  Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library , 2010, Nature Biotechnology.

[36]  Benjamin P. Blackburne,et al.  Mutation spectrum revealed by breakpoint sequencing of human germline CNVs , 2010, Nature Genetics.

[37]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[38]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010, Nature.

[39]  Gary D Bader,et al.  Functional impact of global rare copy number variation in autism spectrum disorders , 2010, Nature.

[40]  Inanç Birol,et al.  Detection and characterization of novel sequence insertions using paired-end next-generation sequencing , 2010, Bioinform..

[41]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[42]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[43]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[44]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[45]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[46]  Adrian M. Stütz,et al.  A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans , 2011, PLoS genetics.

[47]  Emmanouil Collab A map of human genome variation from population-scale sequencing , 2011, Nature.

[48]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.