A Unified Characterization of Population Structure and Relatedness

Many population genetic activities, ranging from evolutionary studies to association mapping, to forensic identification, rely on appropriate estimates of population structure or relatedness. All applications require recognition that quantities with an underlying meaning of allelic dependence are not defined in an absolute sense, but instead are made “relative to” some set of alleles other than the target set. The 1984 Weir and Cockerham FST estimate made explicit that the reference set of alleles was across populations, whereas standard kinship estimates do not make the reference explicit. Weir and Cockerham stated that their FST estimates were for independent populations, and standard kinship estimates have an implicit assumption that pairs of individuals in a study sample, other than the target pair, are unrelated or are not inbred. However, populations lose independence when there is migration between them, and dependencies between pairs of individuals in a population exist for more than one target pair. We have therefore recast our treatments of population structure, relatedness, and inbreeding to make explicit that the parameters of interest involve the differences in degrees of allelic dependence between the target and the reference sets of alleles, and so can be negative. We take the reference set to be the population from which study individuals have been sampled. We provide simple moment estimates of these parameters, phrased in terms of allelic matching within and between individuals for relatedness and inbreeding, or within and between populations for population structure. A multi-level hierarchy of alleles within individuals, alleles between individuals within populations, and alleles between populations, allows a unified treatment of relatedness and population structure. We expect our new measures to have a wide range of applications, but we note that their estimates are sensitive to rare or private variants: some population-characterization applications suggest exploiting those sensitivities, whereas estimation of relatedness may best use all genetic markers without filtering on minor allele frequency.

[1]  E. Thompson,et al.  Efficient Estimation of Realized Kinship from Single Nucleotide Polymorphism Genotypes , 2017, Genetics.

[2]  Alejandro Ochoa,et al.  FST and kinship for arbitrary population structures I: Generalized definitions , 2016, bioRxiv.

[3]  Alejandro Ochoa,et al.  FST and kinship for arbitrary population structures II: Method of moments estimators , 2016, bioRxiv.

[4]  James Curran,et al.  Population-specific FST values for forensic STR markers: A worldwide survey. , 2016, Forensic science international. Genetics.

[5]  B. Peter,et al.  Admixture, Population Structure, and F-Statistics , 2015, Genetics.

[6]  Luca Pagani,et al.  Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference , 2015, PLoS genetics.

[7]  P. Muir,et al.  related: an R package for analysing pairwise relatedness from codominant molecular markers , 2015, Molecular ecology resources.

[8]  E. J. McTavish,et al.  How do SNP ascertainment schemes and population demographics affect inferences about population history? , 2015, BMC Genomics.

[9]  D. Balding,et al.  Relatedness in the post-genomic era: is it still useful? , 2014, Nature Reviews Genetics.

[10]  J. Wang,et al.  Marker‐based estimates of relatedness and inbreeding coefficients: an assessment of current methods , 2014, Journal of evolutionary biology.

[11]  N. Patterson,et al.  Estimating and interpreting FST: The impact of rare variants , 2013, Genome research.

[12]  E. Thompson Identity by Descent: Variation in Meiosis, Across Genomes, and in Populations , 2013, Genetics.

[13]  Noah A. Rosenberg,et al.  The Relationship Between FST and the Frequency of the Most Frequent Allele , 2013, Genetics.

[14]  G. Coop,et al.  Robust Identification of Local Adaptation from Allele Frequencies , 2012, Genetics.

[15]  W. G. Hill,et al.  Variation in actual relationship among descendants of inbred individuals. , 2012, Genetics research.

[16]  O. Ovaskainen,et al.  Estimating Population-Level Coancestry Coefficients by an Admixture F Model , 2012, Genetics.

[17]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[18]  B S Weir,et al.  Variation in actual relationship as a consequence of Mendelian sampling and linkage. , 2011, Genetics research.

[19]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[20]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[21]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[22]  O. Gaggiotti,et al.  Quantifying population structure using the F‐model , 2010, Molecular ecology resources.

[23]  B. Weir,et al.  Population Structure With Localized Haplotype Clusters , 2010, Genetics.

[24]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[25]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[26]  William J. Astle,et al.  Population Structure and Cryptic Relatedness in Genetic Association Studies , 2009, 1010.4681.

[27]  Alkes L. Price,et al.  Reconstructing Indian Population History , 2009, Nature.

[28]  J. Wang,et al.  Parentage and Sibship Inference From Multilocus Genotype Data Under Polygamy , 2009, Genetics.

[29]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[30]  L. Jost GST and its relatives do not measure differentiation , 2008, Molecular ecology.

[31]  Richard A. Nichols,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2008, Genetica.

[32]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[33]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[34]  Oscar Gaggiotti,et al.  Identifying the Environmental Factors That Determine the Genetic Structure of Populations , 2006, Genetics.

[35]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[36]  Seongho Song,et al.  DIFFERENTIATION AMONG POPULATIONS WITH MIGRATION, MUTATION, AND DRIFT: IMPLICATIONS FOR GENETIC INFERENCE , 2006, Evolution; international journal of organic evolution.

[37]  W. G. Hill,et al.  Measures of human population structure show heterogeneity among genomic regions. , 2005, Genome research.

[38]  M. Beaumont Adaptation and speciation: what can F(st) tell us? , 2005, Trends in ecology & evolution.

[39]  Rongwei Fu,et al.  Bayesian models for the analysis of genetic structure when populations are correlated , 2005, Bioinform..

[40]  J. Goudet HIERFSTAT , a package for R to compute and test hierarchical F -statistics , 2005 .

[41]  D. Balding,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2005, Genetica.

[42]  X. Gu Statistical Framework for Phylogenomic Analysis of Gene Family Expression Profiles , 2004, Genetics.

[43]  Mark D Shriver,et al.  The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs , 2004, Human Genomics.

[44]  B. Weir,et al.  Moment estimation of population diversity and genetic distance from data on recessive markers * , 2004, Molecular ecology.

[45]  D. Balding,et al.  Identifying adaptive genetic divergence among populations from genome scans , 2004, Molecular ecology.

[46]  D. Balding Likelihood-based inference for genetic correlation coefficients. , 2003, Theoretical population biology.

[47]  Rongwei Fu,et al.  Exact moment calculations for genetic models with migration, mutation, and drift. , 2003, Theoretical population biology.

[48]  B. Milligan,et al.  Maximum-likelihood estimation of relatedness. , 2003, Genetics.

[49]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[50]  Dipak K Dey,et al.  A Bayesian approach to inferring population structure from dominant markers , 2002, Molecular ecology.

[51]  Jinliang Wang,et al.  An estimator for pairwise relatedness using molecular markers. , 2002, Genetics.

[52]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[53]  B S Weir,et al.  Estimating F-statistics. , 2002, Annual review of genetics.

[54]  M. Lynch,et al.  Estimation of pairwise relatedness with molecular markers. , 1999, Genetics.

[55]  B. Epperson,et al.  Anecdotal, historical and critical commentaries on genetics. , 1996, Genetics.

[56]  J. Ott Genetic data analysis II , 1997 .

[57]  M Raymond,et al.  Testing differentiation in diploid populations. , 1996, Genetics.

[58]  Kermit Ritland,et al.  Estimators for pairwise relatedness and individual inbreeding coefficients , 1996 .

[59]  F Rousset,et al.  Equilibrium values of measures of population subdivision for stepwise mutation processes. , 1996, Genetics.

[60]  D E Weeks,et al.  Similarity of DNA fingerprints due to chance and relatedness. , 1993, Human heredity.

[61]  M. Slatkin,et al.  Estimation of levels of gene flow from DNA sequence data. , 1992, Genetics.

[62]  D. Queller,et al.  ESTIMATING RELATEDNESS USING GENETIC MARKERS , 1989, Evolution; international journal of organic evolution.

[63]  M. Lynch,et al.  Estimation of relatedness by DNA fingerprinting. , 1988, Molecular biology and evolution.

[64]  M. Nei Molecular Evolutionary Genetics , 1987 .

[65]  M. Slatkin RARE ALLELES AS INDICATORS OF GENE FLOW , 1985, Evolution; international journal of organic evolution.

[66]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[67]  B S Weir,et al.  Estimation of the coancestry coefficient: basis for a short-term genetic distance. , 1983, Genetics.

[68]  M. Nei,et al.  Estimation of fixation indices and gene diversities , 1983, Annals of human genetics.

[69]  F. Wilkinson,et al.  Quenching of triplet states of organic compounds by chromium(III) tris(hexafluoroacetylacetonate) in benzene solution as a result of energy and electron transfer , 1983 .

[70]  B S Weir,et al.  Variance of actual inbreeding. , 1983, Theoretical population biology.

[71]  D. Hartl,et al.  Principles of population genetics , 1981 .

[72]  E A Thompson,et al.  The estimation of pairwise relationships , 1975, Annals of human genetics.

[73]  M. Nei Analysis of gene diversity in subdivided populations. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[74]  T. Maruyama,et al.  Effective number of alleles in a subdivided population. , 1970, Theoretical population biology.

[75]  C. Cockerham,et al.  VARIANCE OF GENE FREQUENCIES , 1969, Evolution; international journal of organic evolution.

[76]  Alan Robertson,et al.  180. Note: Weighting in the Estimation of Variance Components in the Unbalanced Single Classification , 1962 .

[77]  J. Tukey Variances of Variance Components: II. The Unbalanced Single Classification , 1957 .

[78]  S WRIGHT,et al.  Genetical Structure of Populations , 1950, British medical journal.

[79]  S. Wright,et al.  Isolation by Distance. , 1943, Genetics.

[80]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[81]  Sewall Wright,et al.  Coefficients of Inbreeding and Relationship , 1922, The American Naturalist.