Model fitting and inference under latent equilibrium processes

This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y|X,θ)f(X|θ)f(θ), where Y is the (set of) observed data, θ is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X(t+1)|X(t),θ), where X(t) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given θ, the transition model f(X(t+1)|X(t),θ) is known but the distribution of the stochastic process in equilibrium, that is f(X|θ), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time.We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model.As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases.

[1]  E. Nummelin General irreducible Markov chains and non-negative operators: Positive and null recurrence , 1984 .

[2]  P Donnelly,et al.  Coalescents and genealogical structure under neutrality. , 1995, Annual review of genetics.

[3]  A. Lorenc,et al.  Atmospheric modelling, data assimilation and predictability. By Eugenia Kalnay. Cambridge University Press. 2003. pp. xxii + 341. ISBNs 0 521 79179 0, 0 521 79629 6. , 2003 .

[4]  Samuel Karlin,et al.  Rates of Approach to Homozygosity for Finite Stochastic Models with Variable Population Size , 1968, The American Naturalist.

[5]  J. Felsenstein,et al.  Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. , 1999, Genetics.

[6]  Rongwei Fu,et al.  Bayesian models for the analysis of genetic structure when populations are correlated , 2005, Bioinform..

[7]  Herman H. Shugart,et al.  How the Earthquake Bird Got Its Name and Other Tales of an Unbalanced Nature , 2004 .

[8]  M. Saunders,et al.  Plant-Provided Food for Carnivorous Insects: a Protective Mutualism and Its Applications , 2009 .

[9]  John Skilling,et al.  Maximum Entropy and Bayesian Methods , 1989 .

[10]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[11]  E. Nummelin General irreducible Markov chains and non-negative operators: Preface , 1984 .

[12]  S. Hubbell,et al.  The unified neutral theory of biodiversity and biogeography at age ten. , 2011, Trends in ecology & evolution.

[13]  Ricard V Solé,et al.  Analytic solution of Hubbell's model of local community dynamics. , 2003, Theoretical population biology.

[14]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[15]  K. Holsinger,et al.  Bayesian approaches for the analysis of population genetic structure: an example from Platanthera leucophaea (Orchidaceae) , 2004, Molecular ecology.

[16]  L. Mark Berliner,et al.  Hierarchical Bayesian Time Series Models , 1996 .

[17]  E. Nummelin General irreducible Markov chains and non-negative operators: List of symbols and notation , 1984 .

[18]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[19]  James S. Clark,et al.  HIERARCHICAL BAYES FOR STRUCTURED, VARIABLE POPULATIONS: FROM RECAPTURE DATA TO LIFE‐HISTORY PREDICTION , 2005 .

[20]  W. Ewens Mathematical Population Genetics , 1980 .

[21]  Rongwei Fu,et al.  Exact moment calculations for genetic models with migration, mutation, and drift. , 2003, Theoretical population biology.

[22]  Atte Moilanen,et al.  METAPOPULATION DYNAMICS: EFFECTS OF HABITAT QUALITY AND LANDSCAPE STRUCTURE , 1998 .

[23]  Richard M Cowling,et al.  Neutral Ecological Theory Reveals Isolation and Rapid Speciation in a Biodiversity Hot Spot , 2005, Science.

[24]  Seongho Song,et al.  DIFFERENTIATION AMONG POPULATIONS WITH MIGRATION, MUTATION, AND DRIFT: IMPLICATIONS FOR GENETIC INFERENCE , 2006, Evolution; international journal of organic evolution.

[25]  L. M. Berliner,et al.  Hierarchical Bayesian space-time models , 1998, Environmental and Ecological Statistics.

[26]  I. Hanski A Practical Model of Metapopulation Dynamics , 1994 .

[27]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.