论文信息 - A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our frame-work can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.

[1] Toshihiro Tanaka. The International HapMap Project , 2003, Nature.

[2] Gil McVean,et al. Identifying recombination hotspots using population genetic data , 2014, 1403.4264.

[3] Michael Q. Zhang,et al. A new method for detecting human recombination hotspots and its applications to the HapMap ENCODE data. , 2006, American journal of human genetics.

[4] Nathaniel Virgo,et al. Permutation-equivariant neural networks applied to dynamics prediction , 2016, ArXiv.

[5] T. Petes,et al. Meiotic recombination hot spots and cold spots , 2001, Nature Reviews Genetics.

[6] Yun S. Song,et al. Robust and scalable inference of population history from hundreds of unphased whole genomes , 2016, Nature Genetics.

[7] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[8] Iain Murray,et al. Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016 .

[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10] Tony Jebara,et al. Permutation invariant SVMs , 2006, ICML.

[11] Daniel R. Schrider,et al. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference , 2018, bioRxiv.

[12] P. Donnelly,et al. The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[13] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[14] Andrew D. Kern,et al. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain , 2013, Genome biology and evolution.

[16] Jody Hey,et al. What's So Hot about Recombination Hotspots? , 2004, PLoS biology.

[17] Paul Fearnhead,et al. Bioinformatics Original Paper Sequenceldhot: Detecting Recombination Hotspots , 2022 .

[18] Yun S. Song,et al. AN ASYMPTOTIC SAMPLING FORMULA FOR THE COALESCENT WITH RECOMBINATION. , 2010, The annals of applied probability : an official journal of the Institute of Mathematical Statistics.

[19] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[20] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] L. Excoffier,et al. Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[23] Jerome Kelleher,et al. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes , 2015, bioRxiv.

[24] Michael I. Jordan,et al. Minimizing Nonconvex Population Risk from Rough Empirical Risk , 2018, ArXiv.

[25] Bai Jiang,et al. Learning Summary Statistic for Approximate Bayesian Computation via Deep Neural Network , 2015, 1510.02175.

[26] Jeffrey D. Wall,et al. Detecting Recombination Hotspots from Patterns of Linkage Disequilibrium , 2016, G3: Genes, Genomes, Genetics.

[27] Wojciech Niemiro,et al. Sufficiency in bayesian models , 1998 .

[28] C. J-F,et al. THE COALESCENT , 1980 .

[29] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[30] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.