Population Structure in a Comprehensive Genomic Data Set on Human Microsatellite Variation

Over the past two decades, microsatellite genotypes have provided the data for landmark studies of human population-genetic variation. However, the various microsatellite data sets have been prepared with different procedures and sets of markers, so that it has been difficult to synthesize available data for a comprehensive analysis. Here, we combine eight human population-genetic data sets at the 645 microsatellite loci they share in common, accounting for procedural differences in the production of the different data sets, to assemble a single data set containing 5795 individuals from 267 worldwide populations. We perform a systematic analysis of genetic relatedness, detecting 240 intra-population and 92 inter-population pairs of previously unidentified close relatives and proposing standardized subsets of unrelated individuals for use in future studies. We then augment the human data with a data set of 84 chimpanzees at the 246 loci they share in common with the human samples. Multidimensional scaling and neighbor-joining analyses of these data sets offer new insights into the structure of human populations and enable a comparison of genetic variation patterns in chimpanzees with those in humans. Our combined data sets are the largest of their kind reported to date and provide a resource for use in human population-genetic studies.

[1]  Sophie Ancelet,et al.  Bayesian Clustering Using Hidden Markov Random Fields in Spatial Population Genetics , 2006, Genetics.

[2]  M. Stephens,et al.  Inferring weak population structure with the assistance of sample group information , 2009, Molecular ecology resources.

[3]  Francisco M De La Vega,et al.  A second-generation combined linkage physical map of the human genome. , 2007, Genome research.

[4]  M. Nachman,et al.  Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. , 2004, Molecular biology and evolution.

[5]  Jérôme Goudet,et al.  Going the distance: human population genetics in a clinal world. , 2007, Trends in genetics : TIG.

[6]  J. Relethford Global Patterns of Isolation by Distance Based on Genetic and Morphological Data , 2004, Human biology.

[7]  David Bryant,et al.  A classification of consensus methods for phylogenetics , 2001, Bioconsensus.

[8]  F. Balloux,et al.  Discriminant analysis of principal components: a new method for the analysis of genetically structured populations , 2010, BMC Genetics.

[9]  D. Meyer,et al.  Balanced polymorphism in bottlenecked populations: the case of the CCR5 5' cis-regulatory region in Amazonian Amerindians. , 2010, Human immunology.

[10]  W. Amos,et al.  Quantifying ascertainment bias and species-specific length differences in human and chimpanzee microsatellites using genome sequences. , 2006, Molecular biology and evolution.

[11]  K. Hill,et al.  Alu insertions versus blood group plus protein genetic variability in four Amerindian populations , 2002, Annals of human biology.

[12]  Daniel Falush,et al.  An African origin for the intimate association between humans and Helicobacter pylori , 2007, Nature.

[13]  L. Zhivotovsky,et al.  Human population expansion and microsatellite variation. , 2000, Molecular biology and evolution.

[14]  M W Feldman,et al.  Genetic absolute dating based on microsatellites and the origin of modern humans. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[15]  N. Rosenberg,et al.  Sampling properties of homozygosity-based statistics for linkage disequilibrium. , 2007, Mathematical biosciences.

[16]  N. Rosenberg,et al.  Coalescence-Time Distributions in a Serial Founder Model of Human Evolutionary History , 2011, Genetics.

[17]  N. Rosenberg A Population-Genetic Perspective on the Similarities and Differences among Worldwide Human Populations , 2011, Human biology.

[18]  W. Amos,et al.  Evidence that two main bottleneck events shaped modern human genetic diversity , 2010, Proceedings of the Royal Society B: Biological Sciences.

[19]  Andrea Manica,et al.  A geographically explicit genetic model of worldwide human-settlement history. , 2006, American journal of human genetics.

[20]  Chaolong Wang,et al.  Inference of unexpected genetic relatedness among individuals in HapMap Phase III. , 2010, American journal of human genetics.

[21]  W. Speed,et al.  Short tandem repeat polymorphism evolution in humans , 1998, European Journal of Human Genetics.

[22]  Jared M. Diamond,et al.  Express train to Polynesia , 1988, Nature.

[23]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[24]  J. Mountain,et al.  Impact of human population history on distributions of individual-level genetic distance , 2005, Human Genomics.

[25]  Andrea Manica,et al.  The effect of ancient population bottlenecks on human phenotypic variation , 2007, Nature.

[26]  O. François,et al.  The genetical bandwidth mapping: a spatial and graphical representation of population genetic structure based on the Wombling method. , 2007, Theoretical population biology.

[27]  A. Need,et al.  A genome-wide genetic signature of Jewish ancestry perfectly separates individuals with and without full Jewish ancestry in a large random sample of European Americans , 2009, Genome Biology.

[28]  Katarzyna Bryc,et al.  On Identifying the Optimal Number of Population Clusters via the Deviance Information Criterion , 2011, PloS one.

[29]  H. Ellegren,et al.  Microsatellite evolution inferred from human– chimpanzee genomic sequence alignments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Nicolas Ray,et al.  Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations. , 2005, Genome research.

[31]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[32]  Itsik Pe'er,et al.  Abraham's children in the genome era: major Jewish diaspora populations comprise distinct genetic clusters with shared Middle Eastern Ancestry. , 2010, American journal of human genetics.

[33]  M P Epstein,et al.  Improved inference of relationship for pairs of individuals. , 2000, American journal of human genetics.

[34]  D. Gasper CULTURE AND DEVELOPMENT , 2006 .

[35]  Cecil M. Lewis Hierarchical modeling of genome-wide Short Tandem Repeat (STR) markers infers native American prehistory. , 2009, American journal of physical anthropology.

[36]  T. Matise,et al.  A combined linkage-physical map of the human genome. , 2004, American journal of human genetics.

[37]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[38]  J. Kere,et al.  Microsatellite diversity and the demographic history of modern humans. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Zachary A. Szpiech,et al.  On the size distribution of private microsatellite alleles. , 2011, Theoretical population biology.

[40]  K. Hunley,et al.  The impact of founder effects, gene flow, and European admixture on native American genetic diversity. , 2011, American journal of physical anthropology.

[41]  Sohini Ramachandran,et al.  Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Shameek Biswas,et al.  Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. , 2009, American journal of human genetics.

[43]  F. Balloux,et al.  Pathogen-Driven Selection and Worldwide HLA Class I Diversity , 2005, Current Biology.

[44]  Oscar Gaggiotti,et al.  Identifying the Environmental Factors That Determine the Genetic Structure of Populations , 2006, Genetics.

[45]  Jonathan Scott Friedlaender,et al.  The Genetic Structure of Pacific Islanders , 2008, PLoS genetics.

[46]  Omri Tal,et al.  Two complementary perspectives on inter-individual genetic distance , 2013, Biosyst..

[47]  M. Boehnke,et al.  Accurate inference of relationships in sib-pair linkage studies. , 1997, American journal of human genetics.

[48]  N. Rosenberg,et al.  A private allele ubiquitous in the Americas , 2007, Biology Letters.

[49]  F. Balloux,et al.  Geography predicts neutral genetic diversity of human populations , 2005, Current Biology.

[50]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[51]  N. Freimer,et al.  Geographic Patterns of Genome Admixture in Latin American Mestizos , 2008, PLoS genetics.

[52]  N. Rosenberg,et al.  Theoretical Population Biology Mathematical Properties of F St between Admixed Populations and Their Parental Source Populations , 2022 .

[53]  L. Cavalli-Sforza The Human Genome Diversity Project: past, present and future , 2005, Nature Reviews Genetics.

[54]  David Comas,et al.  North African Jewish and non-Jewish populations form distinctive, orthogonal clusters , 2012, Proceedings of the National Academy of Sciences.

[55]  C. Tyler-Smith,et al.  Impact of restricted marital practices on genetic variation in an endogamous Gujarati group. , 2012, American journal of physical anthropology.

[56]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[57]  J. Long,et al.  The global pattern of gene identity variation reveals a history of long-range migrations, bottlenecks, and local mate exchange: implications for biological race. , 2009, American journal of physical anthropology.

[58]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[59]  Scott M. Williams,et al.  The Genetic Structure and History of Africans and African Americans , 2009, Science.

[60]  C. Stringer,et al.  Were neandertal and modern human cranial differences produced by natural selection or genetic drift? , 2007, Journal of human evolution.

[61]  D. Huson,et al.  Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. , 2012, Systematic biology.

[62]  Quamrul H. Ashraf,et al.  The 'Out of Africa' Hypothesis, Human Genetic Diversity, and Comparative Economic Development , 2011, The American economic review.

[63]  L. Excoffier,et al.  Large Allele Frequency Differences between Human Continental Groups are more Likely to have Occurred by Drift During range Expansions than by Selection , 2009, Annals of human genetics.

[64]  Mattias Jakobsson,et al.  Sequence determinants of human microsatellite variability , 2009, BMC Genomics.

[65]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[66]  Jonathan Scott Friedlaender,et al.  Genetic and Linguistic Coevolution in Northern Island Melanesia , 2008, PLoS genetics.

[67]  E. Xing,et al.  mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations , 2009, Genetics.

[68]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[69]  Sohini Ramachandran,et al.  A test of the influence of continental axes of orientation on patterns of human gene flow. , 2011, American journal of physical anthropology.

[70]  N Takezaki,et al.  The root of the phylogenetic tree of human populations. , 1996, Molecular biology and evolution.

[71]  G. Barbujani,et al.  Genomic Boundaries between Human Populations , 2006, Human Heredity.

[72]  W S Watkins,et al.  Origins and affinities of modern humans: a comparison of mitochondrial and nuclear genetic data. , 1995, American journal of human genetics.

[73]  M. Hurles,et al.  Untangling Oceanic settlement: the edge of the knowable , 2003 .

[74]  M. Feldman,et al.  Genomic microsatellites identify shared Jewish ancestry intermediate between Middle Eastern and European populations , 2009, BMC Genetics.

[75]  S. Tyekucheva,et al.  The genome-wide determinants of human and chimpanzee microsatellite evolution. , 2007, Genome research.

[76]  N. Rosenberg,et al.  Refining the relationship between homozygosity and the frequency of the most frequent allele , 2012, Journal of mathematical biology.

[77]  G Barbujani,et al.  An apportionment of human DNA diversity. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[78]  L. Excoffier,et al.  A statistical evaluation of models for the initial settlement of the american continent emphasizes the importance of gene flow with Asia. , 2010, Molecular biology and evolution.

[79]  Zachary A. Szpiech,et al.  Statistical Applications in Genetics and Molecular Biology Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis , 2011 .

[80]  Matthew W. Hahn,et al.  Ancient and Recent Positive Selection Transformed Opioid cis-Regulation in Humans , 2005, PLoS biology.

[81]  Saharon Rosset,et al.  The genome-wide structure of the Jewish people , 2010, Nature.

[82]  M. Feldman,et al.  Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. , 2003, American journal of human genetics.

[83]  M. Feldman,et al.  Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure , 2005, PLoS genetics.

[84]  W. Amos The hidden value of missing genotypes. , 2006, Molecular biology and evolution.

[85]  N. Rosenberg,et al.  Polyploid and multilocus extensions of the Wahlund inequality. , 2004, Theoretical population biology.

[86]  Ondrej Libiger,et al.  Generalized Analysis of Molecular Variance , 2007, PLoS genetics.

[87]  L. Cavalli-Sforza,et al.  High resolution of human evolutionary trees with polymorphic microsatellites , 1994, Nature.

[88]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[89]  Claire Bowern,et al.  Rejection of a serial founder effects model of genetic and linguistic coevolution , 2012, Proceedings of the Royal Society B: Biological Sciences.

[90]  M. Bamshad,et al.  Signatures of natural selection in the human genome , 2003, Nature Reviews Genetics.

[91]  A. Hurtado,et al.  Genetic clues about the origin of Aché hunter‐gatherers of Paraguay , 2008, American Journal of Human Biology.

[92]  J. Mullikin,et al.  Microsatellites are molecular clocks that support accurate inferences about history. , 2009, Molecular biology and evolution.

[93]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[94]  F. Balloux,et al.  How accurate is the current picture of human genetic variation? , 2009, Heredity.

[95]  Flora Jay,et al.  Predictions of Native American Population Structure Using Linguistic Covariates in a Hidden Regression Framework , 2011, PloS one.

[96]  G. Sermonti The human genome. , 1988, Rivista di biologia.

[97]  Mattias Jakobsson,et al.  Genetic Variation and Population Structure in Native Americans , 2007, PLoS genetics.

[98]  D. Rubinsztein,et al.  Ascertainment bias cannot entirely account for human microsatellites being longer than their chimpanzee homologues. , 1998, Human molecular genetics.

[99]  T. Beaty,et al.  Genetic Admixture in Brazilians Exposed to Infection with Leishmania chagasi , 2009, Annals of human genetics.

[100]  Li Jin,et al.  Microsatellite evolution in modern humans: a comparison of two data sets from the same populations , 2000 .

[101]  C. Roseman Detecting interregionally diversifying natural selection on modern human cranial form by using matched molecular and morphometric data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[102]  Noah A. Rosenberg,et al.  A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations , 2012, PLoS genetics.

[103]  W. Amos Population-Specific Links Between Heterozygosity and the Rate Human Microsatellite Evolution , 2011, Journal of Molecular Evolution.

[104]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[105]  N. Rosenberg,et al.  Refining the relationship between homozygosity and the frequency of the most frequent allele , 2008, Journal of Mathematical Biology.

[106]  Noah A. Rosenberg,et al.  The Relationship Between FST and the Frequency of the Most Frequent Allele , 2013, Genetics.

[107]  K. Hill,et al.  Geography influences microsatellite polymorphism diversity in Amerindians. , 2005, American journal of physical anthropology.

[108]  M. Jakobsson,et al.  Explaining worldwide patterns of human genetic variation using a coalescent-based serial founder model of migration outward from Africa , 2009, Proceedings of the National Academy of Sciences.

[109]  David Reich,et al.  Genetic Structure of Chimpanzee Populations , 2007, PLoS genetics.

[110]  D. Goldstein,et al.  Genetic evidence for a Paleolithic human population expansion in Africa. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[111]  N. Rosenberg,et al.  An Unbiased Estimator of Gene Diversity in Samples Containing Related Individuals , 2008, Molecular biology and evolution.

[112]  Zachary A. Szpiech,et al.  Genotype, haplotype and copy-number variation in worldwide human populations , 2008, Nature.

[113]  Noah A Rosenberg,et al.  Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India , 2006, PLoS genetics.

[114]  Noah A. Rosenberg,et al.  ADZE: a rarefaction approach for counting alleles private to combinations of populations , 2008, Bioinform..

[115]  Noah A. Rosenberg Algorithms for Selecting Informative Marker Panels for Population Assignment , 2005, J. Comput. Biol..

[116]  J. Flint,et al.  Heterozygosity increases microsatellite mutation rate, linking it to demographic history , 2008, BMC Genetics.

[117]  Koen Bostoen,et al.  Bringing together linguistic and genetic evidence to test the Bantu expansion , 2012, Proceedings of the Royal Society B: Biological Sciences.

[118]  J. Long,et al.  Information on ancestry from genetic markers , 2004, Genetic epidemiology.

[119]  D. Dey,et al.  A Beta‐Mixture Model for Assessing Genetic Population Structure , 2011, Biometrics.

[120]  N. Rosenberg,et al.  Standardized Subsets of the HGDP‐CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives , 2006, Annals of human genetics.

[121]  Clément Calenge,et al.  The package “adehabitat” for the R software: A tool for the analysis of space and habitat use by animals , 2006 .

[122]  L. Excoffier,et al.  Detecting loci under selection in a hierarchically structured population , 2009, Heredity.

[123]  Kenneth Lange,et al.  Merging Microsatellite Data , 2006, J. Comput. Biol..

[124]  M. Nei,et al.  Empirical Tests of the Reliability of Phylogenetic Trees Constructed With Microsatellite DNA , 2008, Genetics.

[125]  M W Feldman,et al.  An evaluation of genetic distances for use with microsatellite loci. , 1994, Genetics.

[126]  G. Marth,et al.  STRP Screening Sets for the human genome at 5 cM density , 2003, BMC Genomics.

[127]  Serafim Batzoglou,et al.  A serial founder effect model for human settlement out of Africa , 2009, Proceedings of the Royal Society B: Biological Sciences.