Are Skyline Plot-Based Demographic Estimates Overly Dependent on Smoothing Prior Assumptions?

In Bayesian phylogenetics, the coalescent process provides an informative framework for inferring dynamical changes in the effective size of a population from a sampled phylogeny (or tree) of its sequences. Popular coalescent inference methods such as the Bayesian Skyline Plot, Skyride and Skygrid all model this population size with a discontinuous, piecewise-constant likelihood but apply a smoothing prior to ensure that posterior population size estimates transition gradually with time. These prior distributions implicitly encode extra population size information that is not available from the observed coalescent tree (data). Here we present a novel statistic, Ω, to quantify and disaggregate the relative contributions of the coalescent data and prior assumptions to the resulting posterior estimate precision. Our statistic also measures the additional mutual information introduced by such priors. Using Ω we show that, because it is surprisingly easy to over-parametrise piecewise-constant population models, common smoothing priors can lead to overconfident and potentially misleading conclusions, even under robust experimental designs. We propose Ω as a useful tool for detecting when posterior estimate precision is overly reliant on prior choices.

[1]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[2]  Ilse C. F. Ipsen,et al.  Perturbation Bounds for Determinants and Characteristic Polynomials , 2008, SIAM J. Matrix Anal. Appl..

[3]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[4]  Nicolas Brunel,et al.  Mutual Information, Fisher Information, and Population Coding , 1998, Neural Computation.

[5]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[6]  Wentao Huang,et al.  Information-Theoretic Bounds and Approximations in Neural Population Coding , 2018, Neural Computation.

[7]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[8]  E. Slate Parameterizations for Natural Exponential Families with Quadratic Variance Functions , 1994 .

[9]  James O. Berger,et al.  Overall Objective Priors , 2015, 1504.02689.

[10]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[11]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[12]  Vladimir N. Minin,et al.  Horseshoe‐based Bayesian nonparametric estimation of effective population size trajectories , 2018, Biometrics.

[13]  Yonina C. Eldar,et al.  A Lower Bound on the Bayesian MSE Based on the Optimal Bias Function , 2008, IEEE Transactions on Information Theory.

[14]  Jon A Yamato,et al.  Maximum likelihood estimation of population growth rates based on the coalescent. , 1998, Genetics.

[15]  Carlos H. Muravchik,et al.  Posterior Cramer-Rao bounds for discrete-time nonlinear filtering , 1998, IEEE Trans. Signal Process..

[16]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[17]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[18]  J. Felsenstein,et al.  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. , 1999, Genetics.

[19]  O. Pybus,et al.  The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. , 2003, Molecular biology and evolution.

[20]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[21]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[22]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[23]  Beth Shapiro,et al.  Rise and Fall of the Beringian Steppe Bison , 2004, Science.

[24]  T. Rothenberg Identification in Parametric Models , 1971 .

[25]  Louis du Plessis,et al.  Jointly Inferring the Dynamics of Population Size and Sampling Intensity from Molecular Sequences , 2019, bioRxiv.

[26]  S. Ho,et al.  Skyline‐plot methods for estimating demographic history from nucleotide sequences , 2011, Molecular ecology resources.

[27]  Kris V Parag,et al.  Optimal Point Process Filtering and Estimation of the Coalescent Process , 2015, bioRxiv.

[28]  Kris V Parag,et al.  Robust Design for Coalescent Model Inference , 2018, bioRxiv.

[29]  Christl A. Donnelly,et al.  Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models , 2019, bioRxiv.

[30]  Kris V Parag,et al.  An integrated framework for the joint inference of demographic history and sampling intensity from genealogies or genetic sequences , 2019 .

[31]  O. Pybus,et al.  An integrated framework for the inference of viral population history from reconstructed genealogies. , 2000, Genetics.

[32]  Daniel Fink A Compendium of Conjugate Priors , 1997 .