A map of human genome variation from population-scale sequencing

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

[1]  J. Clarke,et al.  Medicine , 1907, Bristol medico-chirurgical journal.

[2]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[3]  B. Charlesworth,et al.  The effect of deleterious mutations on neutral molecular variation. , 1993, Genetics.

[4]  C. Tournamille,et al.  Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy–negative individuals , 1995, Nature Genetics.

[5]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[6]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[7]  Alexey S Kondrashov,et al.  Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases , 2003, Human mutation.

[8]  E. Lander,et al.  Finishing the euchromatic sequence of the human genome , 2004 .

[9]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[10]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[11]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[12]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[13]  Keith C. Cheng,et al.  SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans , 2005, Science.

[14]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[15]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[16]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[17]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[18]  L. Liang,et al.  A genome-wide association study of global gene expression , 2007, Nature Genetics.

[19]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[20]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[21]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[22]  L. Quintana-Murci,et al.  Natural selection has driven population differentiation in modern humans , 2008, Nature Genetics.

[23]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[24]  Peter Donnelly,et al.  A common sequence motif associated with recombination hot spots and genome instability in humans , 2008, Nature Genetics.

[25]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[26]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[27]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[28]  E. Kirkness,et al.  Mobile elements create structural variation: analysis of a complete human genome. , 2009, Genome research.

[29]  Joseph K. Pickrell,et al.  The Role of Geography in Human Adaptation , 2009, PLoS genetics.

[30]  Robert P. Davey,et al.  Population genomics of domestic and wild yeasts , 2008, Nature.

[31]  Guy Sella,et al.  Pervasive Hitchhiking at Coding and Regulatory Sites in Humans , 2009, PLoS genetics.

[32]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[33]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[34]  Michael C Wendl,et al.  The theory of discovering rare variants via DNA sequencing , 2009, BMC Genomics.

[35]  T. Parsons,et al.  Investigation of Heteroplasmy in the Human Mitochondrial DNA Control Region: A Synthesis of Observations from More Than 5000 Global Population Samples , 2009, Journal of Molecular Evolution.

[36]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010 .

[37]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[38]  Hugo Y. K. Lam,et al.  Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library , 2010, Nature Biotechnology.

[39]  Guido Barbujani,et al.  A Predominantly Neolithic Origin for European Paternal Lineages , 2010, PLoS biology.

[40]  Jake K. Byrnes,et al.  Genome-wide association study of copy number variation in 16,000 cases of eight common diseases and 3,000 shared controls , 2010, Nature.

[41]  Gianmauro Cuccuru,et al.  Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis , 2010, Nature Genetics.

[42]  K. Paigen,et al.  Prdm9 Controls Activation of Mammalian Recombination Hotspots , 2010, Science.

[43]  Inês Barroso,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[44]  G. Coop,et al.  PRDM9 Is a Major Determinant of Meiotic Recombination Hotspots in Humans and Mice , 2010, Science.

[45]  H. Kazazian,et al.  High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. , 2010, Genome research.

[46]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[47]  Jonathan C. Cohen,et al.  Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia. , 2010, The New England journal of medicine.

[48]  C. Winkler,et al.  Association of Trypanolytic ApoL1 Variants with Kidney Disease in African Americans , 2010, Science.

[49]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[50]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[51]  P. Donnelly,et al.  Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination , 2010, Science.

[52]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[53]  Tariq Ahmad,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[54]  Emmanouil Collab A map of human genome variation from population-scale sequencing , 2011, Nature.

[55]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.