Imputation of posterior linkage probability relations reveals a significant influence of structural 3D constraints on linkage disequilibrium

Genetic association studies have become increasingly important in unraveling the genetics of diseases or complex traits. Despite their value for modern genetics, conflicting conclusions often arise through the difficulty of confirming and replicating experimental results. We argue that this problem is largely based on the application of statistical relation measures that are not appropriate for genomic data analysis and demonstrate that the standard measures used for Genome-wide association studies or genomics linkage analysis bear a statistic bias. This may come from the violation of underlying assumptions (such as independence or stationarity) as well as from other conceptual limitations in the measures or relations, such as missing invariance with respect to coding or the inability to reflect latent factors. Attempts to introduce unbiased relation measures that avoid these limitations are usually computationally expensive and do not scale for large data sizes being typical for genomics applications. To tackle these problems, we propose a straightforwardly computable relation measure called Linkage Probability (LP). This measure provides the posterior probability of a relation between two categorical data sets and considers potential biases from latent variables. We compare several aspects of popular relation measures through an illustrative example and human genomics data. We demonstrate that the application of LP to the analysis of Single Nucleotide Polymorphisms (SNP) reveals latent 3D steric effects within 1D SNP data, that approximate to chromatin loops captured by high resolution Hi-C maps.

[1]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[4]  Giacomo Cavalli,et al.  Organization and function of the 3 D genome , 2022 .

[5]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[6]  M. Skinner,et al.  Environmentally Induced Epigenetic Transgenerational Inheritance of Reproductive Disease , 2015, Biology of reproduction.

[7]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[8]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[9]  F. Holsboer,et al.  Holocaust Exposure Induced Intergenerational Effects on FKBP5 Methylation , 2016, Biological Psychiatry.

[10]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[11]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[12]  Debora S. Marks,et al.  Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models , 2015, PLoS Comput. Biol..

[13]  M. Sillanpää,et al.  Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses , 2011, Heredity.

[14]  M. Baron,et al.  The search for complex disease genes: fault by linkage or fault by association? , 2001, Molecular Psychiatry.

[15]  E. Whitelaw,et al.  Understanding transgenerational epigenetic inheritance via the gametes in mammals , 2012, Nature Reviews Genetics.

[16]  Tim Becker,et al.  INTERSNP: genome-wide interaction analysis guided by a priori information , 2009, Bioinform..

[17]  M. Skinner Environmental stress and epigenetic transgenerational inheritance , 2014, BMC Medicine.

[18]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[19]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  A. Korte,et al.  The advantages and limitations of trait analysis with GWAS: a review , 2013, Plant Methods.

[22]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[23]  Montgomery Slatkin,et al.  Linkage disequilibrium — understanding the evolutionary past and mapping the medical future , 2008, Nature Reviews Genetics.

[24]  L. Cavalli-Sforza The Human Genome Diversity Project: past, present and future , 2005, Nature Reviews Genetics.

[25]  T. Cremer,et al.  Chromosome territories, nuclear architecture and gene regulation in mammalian cells , 2001, Nature Reviews Genetics.

[26]  A. Groth,et al.  Chromatin replication and epigenome maintenance , 2012, Nature Reviews Molecular Cell Biology.

[27]  S. Tishkoff,et al.  Haplotype variation and genotype imputation in African populations , 2011, Genetic epidemiology.

[28]  R. Lewontin,et al.  THE EVOLUTIONARY DYNAMICS OF COMPLEX POLYMORPHISMS , , , 1960 .

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  Susanne Gerber,et al.  On inference of causality for discrete state models in a multiscale context , 2014, Proceedings of the National Academy of Sciences.

[31]  Michael W. Mahoney,et al.  PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[32]  Manuel Mattheisen,et al.  Integrated Genome-Wide Pathway Association Analysis with INTERSNP , 2012, Human Heredity.

[33]  J. Dostie,et al.  Chromosome folding and its regulation in health and disease. , 2017, Current opinion in genetics & development.

[34]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[35]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[36]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[37]  M. Skinner,et al.  Environmentally induced epigenetic transgenerational inheritance of disease susceptibility. , 2015, Translational research : the journal of laboratory and clinical medicine.

[38]  Andre J. Faure,et al.  3D structure of individual mammalian genomes studied by single cell Hi-C , 2017, Nature.

[39]  Jennifer E. Phillips-Cremins,et al.  Chromatin insulators: linking genome organization to cellular function. , 2013, Molecular cell.

[40]  R. Visser,et al.  Evaluation of LD decay and various LD-decay estimators in simulated and SNP-array data of tetraploid potato , 2016, Theoretical and Applied Genetics.

[41]  Wendy A Bickmore,et al.  The spatial organization of the human genome. , 2013, Annual review of genomics and human genetics.

[42]  Johannes Bohacek,et al.  Epigenetic Inheritance of Disease and Disease Risk , 2013, Neuropsychopharmacology.

[43]  F. Yates Contingency Tables Involving Small Numbers and the χ2 Test , 1934 .

[44]  C. Pirk,et al.  A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera , 2014, Nature Genetics.

[45]  M. Mhlanga,et al.  Chromosomal Contact Permits Transcription between Coregulated Genes , 2013, Cell.

[46]  L. Jorde,et al.  Linkage disequilibrium and the search for complex disease genes. , 2000, Genome research.

[47]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[48]  Giacomo Cavalli,et al.  Organization and function of the 3D genome , 2016, Nature Reviews Genetics.

[49]  T. Misteli,et al.  Painting a Clearer Picture of Chromatin. , 2016, Developmental cell.

[50]  Viviana I. Risca,et al.  Unraveling the 3D genome: genomics tools for multiscale exploration. , 2015, Trends in genetics : TIG.

[51]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[52]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[53]  S. Pillai,et al.  Inferring HIV Transmission Dynamics from Phylogenetic Sequence Relationships , 2008, PLoS medicine.

[54]  P. Bork,et al.  Human gut microbes impact host serum metabolome and insulin sensitivity , 2016, Nature.

[55]  G. Blobel,et al.  Chromatin loops in gene regulation. , 2009, Biochimica et biophysica acta.

[56]  M. Pirinen,et al.  The fine-scale genetic structure of the British population , 2015, Nature.

[57]  P. Holland Statistics and Causal Inference , 1985 .

[58]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[59]  A. Tanay,et al.  Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome , 2012, Cell.

[60]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[61]  S. Chao,et al.  A Genome-Wide Association Study of Resistance to Stripe Rust (Puccinia striiformis f. sp. tritici) in a Worldwide Collection of Hexaploid Spring Wheat (Triticum aestivum L.) , 2015, G3: Genes, Genomes, Genetics.