Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended

Detecting genetic signatures of selection is of great interest for many research issues. Common approaches to separate selective from neutral processes focus on the variance of FST across loci, as does the original Lewontin and Krakauer (LK) test. Modern developments aim to minimize the false positive rate and to increase the power, by accounting for complex demographic structures. Another stimulating goal is to develop straightforward parametric and computationally tractable tests to deal with massive SNP data sets. Here, we propose an extension of the original LK statistic (TLK), named TF–LK, that uses a phylogenetic estimation of the population's kinship (\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathrm{\mathcal{F}}\) \end{document}) matrix, thus accounting for historical branching and heterogeneity of genetic drift. Using forward simulations of single-nucleotide polymorphisms (SNPs) data under neutrality and selection, we confirm the relative robustness of the LK statistic (TLK) to complex demographic history but we show that TF–LK is more powerful in most cases. This new statistic outperforms also a multinomial-Dirichlet-based model [estimation with Markov chain Monte Carlo (MCMC)], when historical branching occurs. Overall, TF–LK detects 15–35% more selected SNPs than TLK for low type I errors (P < 0.001). Also, simulations show that TLK and TF–LK follow a chi-square distribution provided the ancestral allele frequencies are not too extreme, suggesting the possible use of the chi-square distribution for evaluating significance. The empirical distribution of TF–LK can be derived using simulations conditioned on the estimated \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathrm{\mathcal{F}}\) \end{document} matrix. We apply this new test to pig breeds SNP data and pinpoint outliers using TF–LK, otherwise undetected using the less powerful TLK statistic. This new test represents one solution for compromise between advanced SNP genetic data acquisition and outlier analyses.

[1]  Mathieu Gautier,et al.  A whole genome Bayesian scan for adaptive genetic divergence in West African cattle , 2009, BMC Genomics.

[2]  Mathieu Gautier,et al.  The Genome Response to Artificial Selection: A Case Study in Dairy Cattle , 2009, PloS one.

[3]  L. Excoffier,et al.  Detecting loci under selection in a hierarchically structured population , 2009, Heredity.

[4]  Dipak K. Dey,et al.  A Bayesian Hierarchical Model for Analysis of Single-Nucleotide Polymorphisms Diversity in Multilocus, Multipopulation Samples , 2009 .

[5]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[6]  L. Quintana-Murci,et al.  Natural selection has driven population differentiation in modern humans , 2008, Nature Genetics.

[7]  L. Held,et al.  Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among Populations , 2008, Genetics.

[8]  M. Guevara,et al.  "Contrasting patterns of selection at Pinus pinaster Ait. Drought stress candidate genes as revealed by genetic differentiation analyses". , 2008, Molecular biology and evolution.

[9]  C. Chevalet,et al.  Genetic diversity within and between European pig breeds using microsatellite markers. , 2006, Animal genetics.

[10]  R. Nielsen Molecular signatures of natural selection. , 2005, Annual review of genetics.

[11]  M. Beaumont Adaptation and speciation: what can F(st) tell us? , 2005, Trends in ecology & evolution.

[12]  M. Nachman,et al.  Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. , 2004, Molecular biology and evolution.

[13]  D. Balding,et al.  Identifying adaptive genetic divergence among populations from genome scans , 2004, Molecular ecology.

[14]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  P. Taberlet,et al.  The power and promise of population genomics: from genotyping to genome typing , 2003, Nature Reviews Genetics.

[16]  D. Balding Likelihood-based inference for genetic correlation coefficients. , 2003, Theoretical population biology.

[17]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[18]  C. Chevalet,et al.  Measuring genetic distances between breeds: use of some distances in various short term evolution models , 2002, Genetics Selection Evolution.

[19]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[20]  J. Wakeley,et al.  Gene genealogies in a metapopulation. , 2001, Genetics.

[21]  P. Boursot,et al.  Interpretation of variation across marker loci as evidence of selection. , 2001, Genetics.

[22]  J. Wakeley The coalescent in an island model of population subdivision with variation among demes. , 2001, Theoretical population biology.

[23]  J. Wakeley,et al.  Nonequilibrium migration in human history. , 1999, Genetics.

[24]  M. Nordborg Structured coalescent processes on different time scales. , 1997, Genetics.

[25]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[26]  Kunio Tanabe,et al.  An exact Cholesky decomposition and the generalized inverse of the variance-covariance matrix of the multinomial distribution, with applications , 1992 .

[27]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[28]  B S Weir,et al.  Estimation of the coancestry coefficient: basis for a short-term genetic distance. , 1983, Genetics.

[29]  M. Nei,et al.  Mean and variance of FST in a finite number of incompletely isolated populations. , 1977, Theoretical population biology.

[30]  A. Chakravarti,et al.  Drift variances of FST and GST statistics obtained from a finite number of isolated populations. , 1977, Theoretical population biology.

[31]  C. Krimbas,et al.  Testing the heterogeneity of F values: a suggestion and a correction. , 1976, Genetics.

[32]  A. Robertson Gene frequency distributions as a test of selective neutrality. , 1975, Genetics.

[33]  M. Nei,et al.  Lewontin-Krakauer test for neutral genes , 1975 .

[34]  R. Lewontin,et al.  Testing the Heterogeneity of F Values , 1975 .

[35]  A. Robertson Letters to the editors: Remarks on the Lewontin-Krakauer test. , 1975, Genetics.

[36]  R. Lewontin,et al.  Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. , 1973, Genetics.

[37]  D. Dey,et al.  A Bayesian hierarchical model for analysis of SNP diversity in multilocus, multipopulation samples. , 2009, Journal of the American Statistical Association.

[38]  Magali SanCristobal,et al.  Caracterización de la variabilidad genética en las razas de cerdos chinos y europeos. El proyecto PIGBIODIV2 , 2003 .

[39]  B S Weir,et al.  Estimating F-statistics. , 2002, Annual review of genetics.

[40]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .