One size does not fit all: Customizing MCMC methods for hierarchical models using NIMBLE

Abstract Improved efficiency of Markov chain Monte Carlo facilitates all aspects of statistical analysis with Bayesian hierarchical models. Identifying strategies to improve MCMC performance is becoming increasingly crucial as the complexity of models, and the run times to fit them, increases. We evaluate different strategies for improving MCMC efficiency using the open‐source software NIMBLE (R package nimble) using common ecological models of species occurrence and abundance as examples. We ask how MCMC efficiency depends on model formulation, model size, data, and sampling strategy. For multiseason and/or multispecies occupancy models and for N‐mixture models, we compare the efficiency of sampling discrete latent states vs. integrating over them, including more vs. fewer hierarchical model components, and univariate vs. block‐sampling methods. We include the common MCMC tool JAGS in comparisons. For simple models, there is little practical difference between computational approaches. As model complexity increases, there are strong interactions between model formulation and sampling strategy on MCMC efficiency. There is no one‐size‐fits‐all best strategy, but rather problem‐specific best strategies related to model structure and type. In all but the simplest cases, NIMBLE's default or customized performance achieves much higher efficiency than JAGS. In the two most complex examples, NIMBLE was 10–12 times more efficient than JAGS. We find NIMBLE is a valuable tool for many ecologists utilizing Bayesian inference, particularly for complex models where JAGS is prohibitively slow. Our results highlight the need for more guidelines and customizable approaches to fit hierarchical models to ensure practitioners can make the most of occupancy and other hierarchical models. By implementing model‐generic MCMC procedures in open‐source software, including the NIMBLE extensions for integrating over latent states (implemented in the R package nimbleEcology), we have made progress toward this aim.

[1]  Daniel Turek,et al.  Automated Parameter Blocking for Efficient Markov-Chain Monte Carlo Sampling , 2015, 1503.05621.

[2]  J. Nichols,et al.  Investigating species co-occurrence patterns when species are detected imperfectly , 2004 .

[3]  M. Plummer JAGS Version 4.0.0 user manual , 2015 .

[4]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[5]  Daniel Turek,et al.  Efficient Markov chain Monte Carlo sampling for hierarchical hidden Markov models , 2016, Environmental and Ecological Statistics.

[6]  Jarrod Had MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package , 2010 .

[7]  Catherine A Calder,et al.  Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. , 2009, Ecological applications : a publication of the Ecological Society of America.

[8]  Lauren C Ponisio,et al.  Habitat restoration promotes pollinator persistence and colonization in intensively managed agriculture. , 2015, Ecological applications : a publication of the Ecological Society of America.

[9]  Heikki Haario,et al.  Efficient MCMC for Climate Model Parameter Estimation: Parallel Adaptive Chains and Early Rejection , 2012 .

[10]  James T. Thorson,et al.  Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo , 2017 .

[11]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[12]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[13]  Andrew D. Martin,et al.  MCMCpack: Markov chain Monte Carlo in R , 2011 .

[14]  Yaming Yu,et al.  To Center or Not to Center: That Is Not the Question—An Ancillarity–Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency , 2011 .

[15]  L. Mark Berliner,et al.  Subsampling the Gibbs Sampler , 1994 .

[16]  Aaron M. Ellison,et al.  Bayesian inference in ecology , 2004 .

[17]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[18]  J. Nichols,et al.  Advances and applications of occupancy models , 2014 .

[19]  D. MacKenzie Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence , 2005 .

[20]  Heikki Haario,et al.  Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..

[21]  J. Andrew Royle,et al.  A Bayesian state-space formulation of dynamic occupancy models. , 2007, Ecology.

[22]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[23]  Fiona Steele,et al.  The use of simple reparameterizations to improve the efficiency of Markov chain Monte Carlo estimation for multilevel models with applications to discrete time survival models , 2009, Journal of the Royal Statistical Society. Series A,.

[24]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .

[25]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[26]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[27]  Michael Schaub,et al.  Bayesian Population Analysis using WinBUGS: A Hierarchical Perspective , 2011 .

[28]  J. Andrew Royle,et al.  ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE , 2002, Ecology.

[29]  J. Andrew Royle,et al.  Estimating species richness and accumulation by modeling species occurrence and detectability. , 2006, Ecology.

[30]  J. Andrew Royle,et al.  Tigers on trails: occupancy modeling for cluster sampling. , 2009, Ecological applications : a publication of the Ecological Society of America.

[31]  Haavard Rue,et al.  Bayesian Computing with INLA: A Review , 2016, 1604.00860.

[32]  Olivier Gimenez,et al.  State-space modelling of data on marked individuals , 2007 .

[33]  Bradley P Carlin,et al.  spBayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models. , 2007, Journal of statistical software.

[34]  Haavard Rue,et al.  Estimating animal abundance with N-mixture models using the R-INLA package for R , 2017 .

[35]  Murali Haran,et al.  Automated Factor Slice Sampling , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[36]  J Andrew Royle,et al.  A hierarchical model for spatial capture-recapture data. , 2008, Ecology.

[37]  Perry de Valpine,et al.  Proximity of restored hedgerows interacts with local floral diversity and species' traits to shape long-term pollinator metacommunity dynamics. , 2019, Ecology letters.

[38]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[39]  M. Betancourt,et al.  Hamiltonian Monte Carlo for Hierarchical Models , 2013, 1312.0906.

[40]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[41]  J. Andrew Royle,et al.  Multi-species occurrence models to evaluate the effects of conservation and management actions , 2010, Biological Conservation.

[42]  Anders Nielsen,et al.  TMB: Automatic Differentiation and Laplace Approximation , 2015, 1509.00660.

[43]  G. Roberts,et al.  Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[44]  J. Andrew Royle N‐Mixture Models for Estimating Population Size from Spatially Replicated Counts , 2004, Biometrics.

[45]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[46]  J. Andrew Royle,et al.  Estimating Size and Composition of Biological Communities by Modeling the Occurrence of Species , 2005 .

[47]  Duncan Temple Lang,et al.  Programming With Models: Writing Statistical Algorithms for General Model Structures With NIMBLE , 2015, 1505.05093.

[48]  Mevin B. Hooten,et al.  Bayesian Models: A Statistical Primer for Ecologists , 2015 .

[49]  Éric Parent,et al.  A Bayesian state-space modelling framework for fitting a salmon stage-structured population dynamic model to multiple time series of field data , 2004 .

[50]  Gareth O. Roberts,et al.  A General Framework for the Parametrization of Hierarchical Models , 2007, 0708.3797.

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  Steven R. Beissinger,et al.  Estimating abundance of unmarked animal populations: accounting for imperfect detection and other sources of zero inflation , 2015 .

[53]  Mevin B. Hooten,et al.  A guide to Bayesian model selection for ecologists , 2015 .