Estimation of the neutrality index.

The McDonald-Kreitman (MK) test is a simple and widely used test of selection in which the numbers of nonsilent and silent substitutions (D(n) and D(s)) are compared with the numbers of nonsilent and silent polymorphisms (P(n) and P(s)). The neutrality index (NI = D(s)P(n)/D(n)P(s)), the odds ratio (OR) of the MK table, measures the direction and degree of departure from neutral evolution. The mean of NI values across genes is often taken to summarize patterns of selection in a species. Here, we show that this leads to statistical bias in both simulated and real data to the extent that species, which show a pattern of adaptive evolution, can apparently be subject to weak purifying selection and vice versa. We show that this bias can be removed by using a variant of the Cochran-Mantel-Haenszel procedure for estimating a weighted average OR. We also show that several point estimators of NI are statistically biased even when cutoff values are employed. We therefore suggest that a new statistic be used to study patterns of selection when data are sparse, the direction of selection: DoS = D(n)/(D(n) + D(s)) - P(n)/(P(n) + P(s)).

[1]  S. Wright,et al.  Genome-wide evidence for efficient positive and purifying selection in Capsella grandiflora, a plant species with a large effective population size. , 2010, Molecular biology and evolution.

[2]  A. Eyre-Walker,et al.  Evidence for Variation in the Effective Population Size of Animal Mitochondrial DNA , 2009, PloS one.

[3]  Matthew W. Hahn,et al.  “Reverse Ecology” and the Power of Population Genomics , 2008, Evolution; international journal of organic evolution.

[4]  A. Hughes,et al.  Synonymous and nonsynonymous polymorphisms versus divergences in bacterial genomes. , 2008, Molecular biology and evolution.

[5]  Colin N. Dewey,et al.  Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans , 2007, PLoS biology.

[6]  B. Charlesworth,et al.  Patterns of Molecular Variation and Evolution in Drosophila americana and Its Relatives , 2007, Genetics.

[7]  C. Meiklejohn,et al.  Positive and negative selection on the mitochondrial genome. , 2007, Trends in genetics : TIG.

[8]  Chenhui Zhang,et al.  Adaptive genic evolution in the Drosophila genomes , 2007, Proceedings of the National Academy of Sciences.

[9]  A. Eyre-Walker,et al.  The rate of adaptive evolution in enteric bacteria. , 2006, Molecular biology and evolution.

[10]  John J Welch,et al.  Estimating the Genomewide Rate of Adaptive Protein Evolution in Drosophila , 2006, Genetics.

[11]  Nicolas Galtier,et al.  Population Size Does Not Influence Mitochondrial Genetic Diversity in Animals , 2006, Science.

[12]  M. Delgado-Rodríguez Statistical analysis of epidemiologic data, 3rd ed , 2005 .

[13]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[14]  D. Presgraves,et al.  Recombination Enhances Protein Adaptation in Drosophila melanogaster , 2005, Current Biology.

[15]  B. Charlesworth,et al.  Patterns of Selection on Synonymous and Nonsynonymous Variants in Drosophila miranda , 2005, Genetics.

[16]  A. Eyre-Walker,et al.  The genomic rate of adaptive amino acid substitution in Drosophila. , 2004, Molecular biology and evolution.

[17]  Adam Eyre-Walker,et al.  Adaptive protein evolution in Drosophila , 2002, Nature.

[18]  Justin C. Fay,et al.  Positive and negative selection on the human genome. , 2001, Genetics.

[19]  H. Tachida DNA evolution under weak selection. , 2000, Gene.

[20]  L. Kann,et al.  Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. , 1996, Molecular biology and evolution.

[21]  D. L. Jenkins,et al.  A test for adaptive change in DNA sequences controlling transcription , 1995, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  B. Charlesworth The effect of background selection against deleterious mutations on weakly selected, linked variants. , 1994, Genetical research.

[23]  L. R. Curtin Statistical Analysis of Epidemiologic Data , 1993 .

[24]  C. Aquadro,et al.  Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster , 1992, Nature.

[25]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[26]  R F Woolson,et al.  Statistical analysis of K 2 x 2 tables: a comparative study of estimators/test statistics for association and homogeneity. , 1990, Environmental health perspectives.

[27]  R. Woolson,et al.  A Monte Carlo investigation of homogeneity tests of the odds ratio under various sample size configurations. , 1989, Biometrics.

[28]  Nicholas P. Jewell,et al.  On the Bias of Commonly Used Measures of Association for 2 x 2 Tables , 1986 .

[29]  N P Jewell,et al.  Small-sample bias of point estimators of the odds ratio from matched sets. , 1984, Biometrics.

[30]  Norman E. Breslow,et al.  Odds ratio estimators when the data are sparse , 1981 .

[31]  R. Greenberg Biometry , 1969, The Yale Journal of Biology and Medicine.

[32]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[33]  B. Haldane THE ESTIMATION AND SIGNIFICANCE OF THE LOGARITHM OF A RATIO OF FREQUENCIES , 1956, Annals of human genetics.

[34]  B. Woolf ON ESTIMATING THE RELATION BETWEEN BLOOD GROUP AND DISEASE , 1955, Annals of human genetics.

[35]  W. G. Cochran Some Methods for Strengthening the Common χ 2 Tests , 1954 .

[36]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[37]  Prateek Rastogi Biometrics , 1914, The American Naturalist.

[38]  A. Hughes,et al.  Synonymous and Nonsynonymous Polymorphism vs . Divergence in Bacterial Genomes , 2008 .

[39]  Allan Donner,et al.  Small sample performance of tests of homogeneity of odds ratios in K 2 x 2 tables. , 1992, Statistics in medicine.

[40]  S Greenland,et al.  Interpretation and estimation of summary ratios under heterogeneity. , 1982, Statistics in medicine.

[41]  R. Tarone,et al.  On summary estimators of relative risk. , 1981, Journal of chronic diseases.