Entropy and Information Approaches to Genetic Diversity and its Expression: Genomic Geography

This article highlights advantages of entropy-based genetic diversity measures, at levels from gene expression to landscapes. Shannon’s entropy-based diversity is the standard for ecological communities. The exponentials of Shannon’s and the related “mutual information” excel in their ability to express diversity intuitively, and provide a generalised method of considering microscopic behaviour to make macroscopic predictions, under given conditions. The hierarchical nature of entropy and information allows integrated modeling of diversity along one DNA sequence, and between different sequences within and among populations, species, etc. The aim is to identify the formal connections between genetic diversity and the flow of information to and from the environment.

[1]  L. Jost Entropy and diversity , 2006 .

[2]  P. Smouse Likelihood analysis of recombinational disequilibrium in multiple-locus gametic frequencies. , 1974, Genetics.

[3]  Jayanth R Banavar,et al.  Inferring species interactions in tropical forests , 2009, Proceedings of the National Academy of Sciences.

[4]  L. Cardon,et al.  Allelic association patterns for a dense SNP map , 2004, Genetic epidemiology.

[5]  P. Rogan,et al.  Predicting severity of haemophilia A and B splicing mutations by information analysis , 2006, Haemophilia : the official journal of the World Federation of Hemophilia.

[6]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[7]  S WRIGHT,et al.  Genetical Structure of Populations , 1950, British medical journal.

[8]  M. Tristem Molecular Evolution — A Phylogenetic Approach. , 2000, Heredity.

[9]  L. Jost GST and its relatives do not measure differentiation , 2008, Molecular ecology.

[10]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[11]  H. Gregorius A Diversity-Independent Measure of Evenness , 1990, The American Naturalist.

[12]  David Loewenstern,et al.  Significantly lower entropy estimates for natural DNA sequences , 1997, Proceedings DCC '97. Data Compression Conference.

[13]  Montgomery Slatkin,et al.  Linkage disequilibrium — understanding the evolutionary past and mapping the medical future , 2008, Nature Reviews Genetics.

[14]  M. Hill Diversity and Evenness: A Unifying Notation and Its Consequences , 1973 .

[15]  C. Clarke,et al.  Further Studies on the Genetics of the Mimetic Butterfly Papilio memnon L. , 1971 .

[16]  R. Lande Statistics and partitioning of species diversity, and similarity among multiple communities , 1996 .

[17]  D. Stewart,et al.  TETRASAT: a program for the population analysis of allotetraploid microsatellite data , 2006 .

[18]  C. Díaz-Avalos,et al.  Scaling patterns of plankton diversity: a study of ciliates in a tropical coastal lagoon , 2009, Hydrobiologia.

[19]  Anit Raja Banerjee,et al.  An Introduction to Conservation Genetics , 2010, The Yale Journal of Biology and Medicine.

[20]  O. Leimar,et al.  GST is still a useful measure of genetic differentiation — a comment on Jost's D , 2009, Molecular ecology.

[21]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[22]  L. Goddard Information Theory , 1962, Nature.

[23]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[24]  Elena R. Álvarez-Buylla,et al.  Information flow during gene activation by signaling molecules: ethylene transduction in Arabidopsis cells as a study system , 2009, BMC Systems Biology.

[25]  Sung-Hou Kim,et al.  Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method , 2009, Proceedings of the National Academy of Sciences.

[26]  T. D. Schneider,et al.  Evolution of biological information. , 2000, Nucleic acids research.

[27]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[28]  S. Lavorel,et al.  Partitioning of functional diversity reveals the scale and extent of trait convergence and divergence , 2009 .

[29]  A. Maritan,et al.  Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns , 2006, Proceedings of the National Academy of Sciences.

[30]  Roderick C. Dewar,et al.  Maximum Entropy Production as an Inference Algorithm that Translates Physical Assumptions into Macroscopic Predictions: Don't Shoot the Messenger , 2009, Entropy.

[31]  Robert A Gatenby,et al.  Information Theory in Living Systems, Methods, Applications, and Challenges , 2007, Bulletin of mathematical biology.

[32]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[33]  Dan Geiger,et al.  Model-based inference of haplotype block variation , 2003, RECOMB '03.

[34]  Premal Shah,et al.  Measuring and Detecting Molecular Adaptation in Codon Usage Against Nonsense Errors During Protein Translation , 2009, Genetics.

[35]  Franck Jabot,et al.  Measurement of biological information with applications from genes to landscapes , 2006, Molecular ecology.

[36]  Gil McVean,et al.  The Structure of Linkage Disequilibrium Around a Selective Sweep , 2007, Genetics.

[37]  Qinglu Zeng,et al.  Genetic structure and variation in the relict populations of Alsophila spinulosa from southern China based on RAPD markers and cpDNA atpB-rbcL sequence data. , 2004, Hereditas.

[38]  R. Pearson,et al.  How biological diversity influences ecosystem function: a test with a tropical stream detritivore guild , 2007, Ecological Research.

[39]  A. Chao,et al.  Partitioning diversity for conservation analyses , 2010 .

[40]  A. Chao,et al.  Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample , 2004, Environmental and Ecological Statistics.

[41]  Cécile Ané,et al.  Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories. , 2005, Systematic biology.

[42]  Martin Zwick,et al.  Ordering genetic algorithm genomes with reconstructability analysis , 2003, Int. J. Gen. Syst..

[43]  Calyampudi R. Rao Diversity and dissimilarity coefficients: A unified approach☆ , 1982 .

[44]  L. Jost Partitioning diversity into independent alpha and beta components. , 2007, Ecology.

[45]  R. Dewar,et al.  Statistical mechanics unifies different ecological patterns. , 2007, Journal of theoretical biology.

[46]  A. M. Mathai,et al.  On generalized entropy measures and pathways , 2007, 0704.0326.

[47]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[48]  H. Reeve,et al.  Estimating effective paternity number in social insects and the effective number of alleles in a population , 2003, Molecular ecology.

[49]  B. Wilsey,et al.  Early-successional plants regulate grassland productivity and species composition: a removal experiment , 2006 .

[50]  Stephen R Proulx,et al.  Mutual information reveals variation in temperature-dependent sex determination in response to environmental fluctuation, lifespan and selection , 2008, Proceedings of the Royal Society B: Biological Sciences.

[51]  M. Vellend Species Diversity and Genetic Diversity: Parallel Processes and Correlated Patterns , 2005, The American Naturalist.

[52]  B. Zeeberg,et al.  Shannon information theoretic computation of synonymous codon usage biases in coding regions of human and mouse genomes. , 2002, Genome research.

[53]  E. Herrmann,et al.  Quasispecies heterogeneity of the carboxy-terminal part of the E2 gene including the PePHD and sensitivity of hepatitis C virus 1b isolates to antiviral therapy. , 2001, Virology.

[54]  B. Lamont,et al.  Long‐distance seed dispersal in a metapopulation of Banksia hookeriana inferred from a population allocation analysis of amplified fragment length polymorphism data , 2004, Molecular ecology.

[55]  Paul Marjoram,et al.  Estimating Recombination Rates From Single-Nucleotide Polymorphisms Using Summary Statistics , 2006, Genetics.

[56]  Christoph Adami,et al.  Information theory in molecular biology , 2004, q-bio/0405004.

[57]  C. Keylock Simpson diversity and the Shannon–Wiener index as special cases of a generalized entropy , 2005 .

[58]  Han Olff,et al.  A novel genealogical approach to neutral biodiversity theory , 2004 .

[59]  M. Kimura Stochastic processes and distribution of gene frequencies under natural selection. , 1955, Cold Spring Harbor symposia on quantitative biology.

[60]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[61]  L. Excoffier,et al.  Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. , 1992, Genetics.

[62]  T. Ohta,et al.  Linkage disequilibrium due to random genetic drift , 1969 .

[63]  P. Chanda,et al.  AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations With Complex Phenotypes , 2008, Genetics.

[64]  H. S. Horn,et al.  Measurement of "Overlap" in Comparative Ecological Studies , 1966, The American Naturalist.

[65]  A. Maritan,et al.  Applications of the principle of maximum entropy: from physics to ecology , 2010, Journal of physics. Condensed matter : an Institute of Physics journal.

[66]  F. Gosselin An assessment of the dependence of evenness indices on species richness. , 2006, Journal of theoretical biology.

[67]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[68]  Arend Hintze,et al.  Evolution of Complex Modular Biological Networks , 2007, PLoS Comput. Biol..

[69]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[70]  C. Ricotta,et al.  Towards a unifying approach to diversity measures: bridging the gap between the Shannon entropy and Rao's quadratic index. , 2006, Theoretical population biology.

[71]  Dai Zhang,et al.  Two-stage designs to identify the effects of SNP combinations on complex diseases , 2008, Journal of Human Genetics.

[72]  J. L. Bouzat,et al.  Tracking the long-term decline and recovery of an isolated population , 1998, Science.

[73]  Yi Wang,et al.  Exploration of gene–gene interaction effects using entropy-based methods , 2008, European Journal of Human Genetics.

[74]  James Rosindell,et al.  Unified neutral theory of biodiversity and biogeography , 2010, Scholarpedia.

[75]  Minping Qian,et al.  Gene-Centric Genomewide Association Study via Entropy , 2008, Genetics.

[76]  Robersy Sanchez,et al.  A genetic code Boolean structure. II. The genetic information system as a Boolean information system , 2005, Bulletin of mathematical biology.

[77]  G. Hartl,et al.  Allozymes in mammalian population genetics and systematics: indicative function of a marker system reconsidered. , 1994, EXS.

[78]  Todd H. Oakley,et al.  Phylogenetic diversity metrics for ecological communities: integrating species richness, abundance and evolutionary history. , 2010, Ecology letters.

[79]  J. Fontanari,et al.  Evolutionary dynamics on rugged fitness landscapes: exact dynamics and information theoretical aspects. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[80]  E. C. Pielou The measurement of diversity in different types of biological collections , 1966 .

[81]  L. Jost The Relation between Evenness and Diversity , 2010 .

[82]  S. Sommer The importance of immune gene variability (MHC) in evolutionary ecology and conservation , 2005, Frontiers in Zoology.

[83]  V. Pande,et al.  On the application of statistical physics to evolutionary biology. , 2009, Journal of theoretical biology.

[84]  W. Black,et al.  A FORTRAN program for the calculation and analysis of two-locus linkage disequilibrium coefficients , 1985, Theoretical and Applied Genetics.

[85]  Matthew D Dean,et al.  Linkage Disequilibrium in Wild Mice , 2007, PLoS genetics.

[86]  L. Jost D vs. GST: Response to Heller and Siegismund (2009) and Ryman and Leimar (2009) , 2009 .

[87]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[88]  J. Crow,et al.  THE NUMBER OF ALLELES THAT CAN BE MAINTAINED IN A FINITE POPULATION. , 1964, Genetics.

[89]  T. Golub Counterpoint: Data first , 2010, Nature.

[90]  Michela Marignani,et al.  Computing β‐diversity with Rao's quadratic entropy: a change of perspective , 2007 .

[91]  A. Stirling A general framework for analysing diversity in science, technology and society , 2007, Journal of The Royal Society Interface.

[92]  H. Siegismund,et al.  Relationship between three measures of genetic differentiation GST, DEST and G’ST: how wrong have we been? , 2009, Molecular ecology.

[93]  C. Williams,et al.  An entropy-based measure of founder informativeness. , 2005, Genetical research.

[94]  P. Hedrick Gene Flow and Genetic Restoration: The Florida Panther as a Case Study. , 1995, Conservation biology : the journal of the Society for Conservation Biology.

[95]  A. Chao,et al.  A Two‐Stage Probabilistic Approach to Multiple‐Community Similarity Indices , 2008, Biometrics.

[96]  Xianggui Qu,et al.  The Statistics of Gene Mapping , 2008, Technometrics.

[97]  Richard Shine,et al.  Inbreeding depression in an isolated population of adders Vipera berus , 1996 .

[98]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[99]  W. Ewens Mathematical Population Genetics , 1980 .

[100]  Aidong Zhang,et al.  Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits , 2009, BMC Genomics.

[101]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[102]  R. Lewontin The Apportionment of Human Diversity , 1972 .

[103]  T. Ohta,et al.  Distribution of allelic frequencies in a finite population under stepwise production of neutral alleles. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[104]  R. Crozier Preserving the Information Content of Species: Genetic Diversity, Phylogeny, and Conservation Worth , 1997 .

[105]  M. Bonsall,et al.  Hierarchical partitioning of evolutionary and ecological patterns in the organization of phylogenetically-structured species assemblages: application to rockfish (genus: Sebastes) in the Southern California Bight. , 2009, Ecology letters.

[106]  T. Ohta,et al.  Linkage disequilibrium due to random genetic drift in finite subdivided populations. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[107]  James Joseph Biundo,et al.  Analysis of Contingency Tables , 1969 .

[108]  Likelihood analysis of geographic variation in allelic frequencies , 2004, Theoretical and Applied Genetics.

[109]  Constantinos Daskalakis,et al.  Alignment-Free Phylogenetic Reconstruction , 2010, RECOMB.

[110]  M. Lässig,et al.  Fitness flux and ubiquity of adaptive evolution , 2010, Proceedings of the National Academy of Sciences.

[111]  Robert Weinberg,et al.  Point: Hypotheses first , 2010, Nature.

[112]  C. Buddle,et al.  The importance and use of taxon sampling curves for comparative biodiversity research with forest arthropod assemblages , 2005, The Canadian Entomologist.

[113]  W. Sherwin,et al.  Dispersal limitations, rather than bottlenecks or habitat specificity, can restrict the distribution of rare and endemic rainforest trees. , 2008, American journal of botany.

[114]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[115]  K. Lange,et al.  Prioritizing GWAS results: A review of statistical methods and recommendations for their application. , 2010, American journal of human genetics.

[116]  J. Sved Linkage Disequilibrium and Its Expectation in Human Populations , 2009, Twin Research and Human Genetics.

[117]  Lei Zhang,et al.  A multilocus linkage disequilibrium measure based on mutual information theory and its applications , 2009, Genetica.

[118]  M. B. Lovato,et al.  Genetic diversity and structure of natural populations of Plathymenia reticulata (Mimosoideae), a tropical tree from the Brazilian Cerrado , 2001, Molecular ecology.

[119]  Population genetic diversity in the polyploid complex of wheatgrasses using isoenzyme and RAPD data , 2008, Biologia Plantarum.

[120]  Evsey Kosman,et al.  Conceptual analysis of methods applied to assessment of diversity within and distance between populations with asexual or mixed mode of reproduction. , 2007, The New phytologist.

[121]  H. P. de Vladar,et al.  Statistical Mechanics and the Evolution of Polygenic Quantitative Traits , 2009, Genetics.

[122]  L. Jost,et al.  Interpreting and estimating measures of community phylogenetic structuring , 2008 .

[123]  Jiang Zhang Modeling Multi-species Interacting Ecosystem by a Simple Equation , 2009, 2009 International Joint Conference on Computational Sciences and Optimization.

[124]  Yun S. Song,et al.  The Hitchhiking Effect on Linkage Disequilibrium Between Linked Neutral Loci , 2006, Genetics.

[125]  Y. Iwasa,et al.  Free fitness that always increases in evolution. , 1988, Journal of theoretical biology.

[126]  Oliver Laeyendecker,et al.  Assessment of Hepatitis C Virus Sequence Complexity by Electrophoretic Mobilities of Both Single-and Double-Stranded DNAs , 1998, Journal of Clinical Microbiology.

[127]  M. Ragan,et al.  Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny? , 2007, Systematic biology.

[128]  Mark Kon,et al.  A New Phylogenetic Diversity Measure Generalizing the Shannon Index and Its Application to Phyllostomid Bats , 2009, The American Naturalist.

[129]  M. Bonsall,et al.  Biological diversity: distinct distributions can lead to the maximization of Rao's quadratic entropy. , 2009, Theoretical population biology.

[130]  Michael Krawczak,et al.  Entropy-based SNP selection for genetic association studies , 2003, Human Genetics.

[131]  P. Smouse,et al.  A comparison of the genetic infrastructure of the Ye'cuana and the Yanomama: a likelihood analysis of genotypic variation among populations. , 1978, Genetics.

[132]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .