Expected Shannon Entropy and Shannon Differentiation between Subpopulations for Neutral Genes under the Finite Island Model

Shannon entropy H and related measures are increasingly used in molecular ecology and population genetics because (1) unlike measures based on heterozygosity or allele number, these measures weigh alleles in proportion to their population fraction, thus capturing a previously-ignored aspect of allele frequency distributions that may be important in many applications; (2) these measures connect directly to the rich predictive mathematics of information theory; (3) Shannon entropy is completely additive and has an explicitly hierarchical nature; and (4) Shannon entropy-based differentiation measures obey strong monotonicity properties that heterozygosity-based measures lack. We derive simple new expressions for the expected values of the Shannon entropy of the equilibrium allele distribution at a neutral locus in a single isolated population under two models of mutation: the infinite allele model and the stepwise mutation model. Surprisingly, this complex stochastic system for each model has an entropy expressable as a simple combination of well-known mathematical functions. Moreover, entropy- and heterozygosity-based measures for each model are linked by simple relationships that are shown by simulations to be approximately valid even far from equilibrium. We also identify a bridge between the two models of mutation. We apply our approach to subdivided populations which follow the finite island model, obtaining the Shannon entropy of the equilibrium allele distributions of the subpopulations and of the total population. We also derive the expected mutual information and normalized mutual information (“Shannon differentiation”) between subpopulations at equilibrium, and identify the model parameters that determine them. We apply our measures to data from the common starling (Sturnus vulgaris) in Australia. Our measures provide a test for neutrality that is robust to violations of equilibrium assumptions, as verified on real world data from starlings.

[1]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[2]  F Rousset,et al.  Equilibrium values of measures of population subdivision for stepwise mutation processes. , 1996, Genetics.

[3]  B. Cade,et al.  Genetic diversity and species diversity of stream fishes covary across a land-use gradient , 2011, Oecologia.

[4]  P. Grassberger,et al.  Sequence Alignment, Mutual Information, and Dissimilarity Measures for Constructing Phylogenies , 2010, PloS one.

[5]  E. Hickey,et al.  Genetic comparisons between North American and European populations of Lumbricus terrestris L. , 2012 .

[6]  C. Díaz-Avalos,et al.  Scaling patterns of plankton diversity: a study of ciliates in a tropical coastal lagoon , 2009, Hydrobiologia.

[7]  W. Sherwin,et al.  Mitochondrial DNA offers unique insights into invasion history of the common starling , 2011, Molecular ecology.

[8]  T. Ohta,et al.  Stepwise mutation model and distribution of allelic frequencies in a finite population. , 1978, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. Powell,et al.  Demographic structure, genetic diversity and habitat distribution of the endangered, Australian rainforest tree Macadamia jansenii help facilitate an introduction program , 2011 .

[10]  Peter Beerli,et al.  Unified Framework to Evaluate Panmixia and Migration Direction Among Multiple Sampling Locations , 2010, Genetics.

[11]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[12]  Constantino Tsallis,et al.  Nonextensive statistical mechanics: A brief introduction , 2004 .

[13]  S. Wright Evolution and the Genetics of Populations, Volume 3: Experimental Results and Evolutionary Deductions , 1977 .

[14]  Aidong Zhang,et al.  Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits , 2009, BMC Genomics.

[15]  B D Latter,et al.  The island model of population differentiation: a general solution. , 1973, Genetics.

[16]  L. Jost Partitioning diversity into independent alpha and beta components. , 2007, Ecology.

[17]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[18]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[19]  A. Chao,et al.  Proposing a resolution to debates on diversity partitioning. , 2012, Ecology.

[20]  Lei Zhang,et al.  A multilocus linkage disequilibrium measure based on mutual information theory and its applications , 2009, Genetica.

[21]  A. Lowe,et al.  Consequences of long- and short-term fragmentation on the genetic diversity and differentiation of a late successional rainforest conifer , 2011 .

[22]  Jay L. Lush,et al.  The genetics of populations , 1948 .

[23]  William Bruce Sherwin,et al.  Entropy and Information Approaches to Genetic Diversity and its Expression: Genomic Geography , 2010, Entropy.

[24]  J. Aczel,et al.  On Measures of Information and Their Characterizations , 2012 .

[25]  C. Buddle,et al.  The importance and use of taxon sampling curves for comparative biodiversity research with forest arthropod assemblages , 2005, The Canadian Entomologist.

[26]  B. Kempenaers,et al.  Behaviour‐related DRD4 polymorphisms in invasive bird populations , 2014, Molecular ecology.

[27]  A. Chao,et al.  Partitioning diversity for conservation analyses , 2010 .

[28]  M. Whitlock,et al.  The effective size of a subdivided population. , 1997, Genetics.

[29]  H. S. Horn,et al.  Measurement of "Overlap" in Comparative Ecological Studies , 1966, The American Naturalist.

[30]  M. Hill Diversity and Evenness: A Unifying Notation and Its Consequences , 1973 .

[31]  Aaron M Ellison,et al.  Partitioning diversity. , 2010, Ecology.

[32]  J. Schall,et al.  Microsatellite loci over a thirty-three year period for a malaria parasite (Plasmodium mexicanum): bottleneck in effective population size and effect on allele frequencies , 2012, Parasitology.

[33]  R. Andrus,et al.  One haploid parent contributes 100% of the gene pool for a widespread species in northwest North America , 2011, Molecular ecology.

[34]  M. Whitlock,et al.  Indirect measures of gene flow and migration: FST≠1/(4Nm+1) , 1999, Heredity.

[35]  W. Cowling,et al.  Evidence from genome-wide simple sequence repeat markers for a polyphyletic origin and secondary centers of genetic diversity of Brassica juncea in China and India. , 2013, The Journal of heredity.

[36]  J. Crow,et al.  THE NUMBER OF ALLELES THAT CAN BE MAINTAINED IN A FINITE POPULATION. , 1964, Genetics.

[37]  S. Wright,et al.  The Distribution of Gene Frequencies Under Irreversible Mutation. , 1938, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[39]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[40]  R. Lewontin The Apportionment of Human Diversity , 1972 .

[41]  W. Sherwin,et al.  Dispersal limitations, rather than bottlenecks or habitat specificity, can restrict the distribution of rare and endemic rainforest trees. , 2008, American journal of botany.

[42]  Andreas Wagner,et al.  Neutralism and selectionism: a network-based reconciliation , 2008, Nature Reviews Genetics.

[43]  Reginald D. Smith,et al.  Information Theory and Population Genetics , 2011, ArXiv.

[44]  C. Blyth Note on Estimating Information , 1959 .

[45]  T. Maruyama,et al.  Effective number of alleles in a subdivided population. , 1970, Theoretical population biology.

[46]  D. Geiger,et al.  Admixture mapping of end stage kidney disease genetic susceptibility using estimated mutual information ancestry informative markers , 2010, BMC Medical Genomics.

[47]  R. Dewar,et al.  Predictions of single‐nucleotide polymorphism differentiation between two populations in terms of mutual information , 2011, Molecular ecology.

[48]  C. Keylock Simpson diversity and the Shannon–Wiener index as special cases of a generalized entropy , 2005 .

[49]  A. Chao,et al.  Compositional similarity and β ( beta ) diversity , 2010 .

[50]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[51]  A. Chao,et al.  Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species , 2013 .

[52]  Junying Zhang,et al.  Mutual information and linkage disequilibrium based SNP association study by grouping case-control , 2011, Genes & Genomics.

[53]  A. Chao,et al.  Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample , 2004, Environmental and Ecological Statistics.

[54]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[55]  F. Rousset Genetic Structure and Selection in Subdivided Populations (MPB-40) , 2004 .

[56]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[57]  Franck Jabot,et al.  Measurement of biological information with applications from genes to landscapes , 2006, Molecular ecology.

[58]  Jun Zhu,et al.  Using a mutual information-based site transition network to map the genetic evolution of influenza A/H3N2 virus , 2009, Bioinform..

[59]  L. Jost GST and its relatives do not measure differentiation , 2008, Molecular ecology.

[60]  Pietro Liò,et al.  Substitution Matrices and Mutual Information Approaches to Modeling Evolution , 2009, LION.

[61]  R. Macarthur PATTERNS OF SPECIES DIVERSITY , 1965 .

[62]  W. Sherwin,et al.  Invasive species can't cover their tracks: using microsatellites to assist management of starling (Sturnus vulgaris) populations in Western Australia , 2009, Molecular ecology.

[63]  Pere Caminal,et al.  MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis , 2010, Bioinform..

[64]  Anne E. Magurran,et al.  Biological Diversity: Frontiers in Measurement and Assessment , 2011 .

[65]  T. Ohta,et al.  Distribution of allelic frequencies in a finite population under stepwise production of neutral alleles. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Momiao Xiong,et al.  Mutual Information for Testing Gene-Environment Interaction , 2009, PloS one.

[67]  Mark Kon,et al.  A New Phylogenetic Diversity Measure Generalizing the Shannon Index and Its Application to Phyllostomid Bats , 2009, The American Naturalist.

[68]  Jiang Zhang Modeling Multi-species Interacting Ecosystem by a Simple Equation , 2009, 2009 International Joint Conference on Computational Sciences and Optimization.

[69]  M. Slatkin,et al.  A Quasi-equilibrium theory of the distribution of rare alleles in a subdivided population , 1986, Heredity.

[70]  D. Swati,et al.  In silico comparison of bacterial strains using mutual information , 2007, Journal of Biosciences.

[71]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[72]  Stephen R Proulx,et al.  Mutual information reveals variation in temperature-dependent sex determination in response to environmental fluctuation, lifespan and selection , 2008, Proceedings of the Royal Society B: Biological Sciences.

[73]  A. Chao,et al.  A Two‐Stage Probabilistic Approach to Multiple‐Community Similarity Indices , 2008, Biometrics.

[74]  G. Crooks On Measures of Entropy and Information , 2015 .

[75]  Carlo Ricotta,et al.  Quantifying functional diversity with graph-theoretical measures: advantages and pitfalls , 2008 .

[76]  Todd H. Oakley,et al.  Phylogenetic diversity metrics for ecological communities: integrating species richness, abundance and evolutionary history. , 2010, Ecology letters.

[77]  Kees van Oers,et al.  Drd4 gene polymorphisms are associated with personality variation in a passerine bird , 2007, Proceedings of the Royal Society B: Biological Sciences.

[78]  L. Rieseberg,et al.  Adaptation with gene flow across the landscape in a dune sunflower , 2012, Molecular ecology.

[79]  Anne Chao,et al.  Unifying Species Diversity, Phylogenetic Diversity, Functional Diversity, and Related Similarity and Differentiation Measures Through Hill Numbers , 2014 .

[80]  Motoo Kimura,et al.  A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population*. , 1973, Genetical research.

[81]  W. Parson,et al.  Pasture Names with Romance and Slavic Roots Facilitate Dissection of Y Chromosome Variation in an Exclusively German-Speaking Alpine Region , 2012, PloS one.

[82]  N. Brummitt,et al.  Genetic variation in Delonix s.l. (Leguminosae) in Madagascar revealed by AFLPs: fragmentation, conservation status and taxonomy , 2011, Conservation Genetics.

[83]  Jason Lloyd-Price,et al.  Mutual information in random Boolean models of regulatory networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.