Towards an information theory of quantitative genetics

Quantitative genetics has evolved dramatically in the past century, and the proliferation of genetic data enables the characterization of complex interactions beyond the scope of its theoretical foundations. In this paper, we lay the foundations of an alternative formulation of quantitative genetics based on information theory. Information theory can provide sensitive measures of statistical dependencies among variables, and provides a natural mathematical language for an alternative view of quantitative genetics. In previous work we examined the information content of discrete functions and applied this formalism to the analysis of genetic data. We present here a set of relationships that both unifies the information measures for the set of discrete functions, and uses them to express key quantitative genetic relationships. Information theory measures of variable interdependency are used to identify significant interactions, and a general approach is described for inferring functional relationships within genotype and phenotype data. We present information-based measures of the genetic quantities: penetrance, heritability and degrees of statistical epistasis. Our scope here includes the consideration of three variable dependencies and independently segregating variants, which captures two locus effects, genetic interactions, and two phenotype pleiotropy. However, this formalism and general theory naturally applies to multi-variable interactions and higher-order complex dependencies, and can be adapted to account for population structure, linkage and non-randomly segregating markers. This paper therefore lays the initial groundwork for a full formulation of quantitative genetics based in information theory.

[1]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[2]  S. Weight A Frequency Curve Adapted to Variation in Percentage Occurrence , 1926 .

[3]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  W. J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[5]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[6]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[7]  P. A. P. Moran,et al.  Entropy, Markov processes and Boltzmann's H-theorem , 1961, Mathematical Proceedings of the Cambridge Philosophical Society.

[8]  Hu Kuo Ting,et al.  On the Amount of Information , 1962 .

[9]  G. A. Watterson Some Theoretical Aspects of Diffusion Theory in Population Genetics , 1962 .

[10]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[11]  Te Sun Han,et al.  Multiple Mutual Informations and Multiple Interactions in Frequency Data , 1980, Inf. Control..

[12]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[13]  P. Phillips,et al.  The population genetics of synthetic lethals. , 1998, Genetics.

[14]  B. Frieden,et al.  Population genetics from an information perspective. , 2001, Journal of theoretical biology.

[15]  William Bialek,et al.  Entropy and Inference, Revisited , 2001, NIPS.

[16]  J. Crow,et al.  Shannon's brief foray into genetics. , 2001, Genetics.

[17]  A. J. Bell THE CO-INFORMATION LATTICE , 2003 .

[18]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[19]  Jason H Moore,et al.  Analysis of Gene‐Gene Interactions , 2003, Current protocols in human genetics.

[20]  M Farrall,et al.  Epistasis Between Type 2 Diabetes Susceptibility Loci on Chromosomes 1q21‐25 and 10q23‐26 in Northern Europeans , 2006, Annals of human genetics.

[21]  Lon R. Cardon,et al.  Functional epistasis on a common MHC haplotype associated with multiple sclerosis , 2006, Nature.

[22]  Astrid M. Vicente,et al.  Evidence for epistasis between SLC6A4 and ITGB3 in autism etiology and in the determination of platelet serotonin levels , 2007, Human Genetics.

[23]  W. G. Hill,et al.  Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits , 2008, PLoS genetics.

[24]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[25]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[26]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[27]  Nathan D. Price,et al.  Biological Information as Set-Based Complexity , 2010, IEEE Transactions on Information Theory.

[28]  J. Crow On epistasis: why it is unimportant in polygenic directional selection , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[29]  Diane Gilbert-Diamond,et al.  Analysis of gene-gene interactions. , 2011, Current protocols in human genetics.

[30]  A. P. Beltyukov,et al.  On the amount of information , 2011, Pattern Recognition and Image Analysis.

[31]  Eckehard Olbrich,et al.  Shared Information -- New Insights and Problems in Decomposing Information in Complex Systems , 2012, ArXiv.

[32]  Ronald M. Nelson,et al.  A century after Fisher: time for a new paradigm in quantitative genetics. , 2013, Trends in genetics : TIG.

[33]  T. Mackay Epistasis and quantitative traits: using model organisms to study gene–gene interactions , 2013, Nature Reviews Genetics.

[34]  David J. Galas,et al.  Describing the Complexity of Systems: Multivariable "Set Complexity" and the Information Basis of Systems Biology , 2013, J. Comput. Biol..

[35]  The role of genetic interactions in yeast quantitative traits , 2015 .

[36]  David J. Galas,et al.  Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm , 2015, J. Comput. Biol..

[37]  Leonid Kruglyak,et al.  Genetic interactions contribute less than additive effects to quantitative trait variation in yeast , 2015, Nature Communications.

[38]  T. Mackay,et al.  The Genetic Architecture of Quantitative Traits Cannot Be Inferred from Variance Component Analysis , 2016, bioRxiv.

[39]  G. Churchill,et al.  Weak Epistasis Generally Stabilizes Phenotypes in a Mouse Intercross , 2016, PLoS genetics.

[40]  David J. Galas,et al.  Expansion of the Kullback-Leibler Divergence, and a new class of information metrics , 2017, Axioms.

[41]  Ryan J Urbanowicz,et al.  Analysis of Gene‐Gene Interactions , 2003, Current protocols in human genetics.

[42]  T. Mackay,et al.  Estimating Realized Heritability in Panmictic Populations , 2017, Genetics.

[43]  David J. Galas,et al.  The Information Content of Discrete Functions and Their Application in Genetic Data Analysis , 2017, J. Comput. Biol..

[44]  Mark T. W. Ebbert,et al.  Linkage, whole genome sequence, and biological data implicate variants in RAB10 in Alzheimer’s disease resilience , 2017, Genome Medicine.

[45]  Justin S. Sanchez,et al.  Resistance to autosomal dominant Alzheimer’s in an APOE3-Christchurch homozygote: a case report , 2019, Nature Medicine.

[46]  David J. Galas,et al.  Symmetries among Multivariate Information Measures Explored Using Möbius Operators , 2019, Entropy.

[47]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .