Statistical inference for stochastic simulation models--theory and application.

Statistical models are the traditional choice to test scientific theories when observations, processes or boundary conditions are subject to stochasticity. Many important systems in ecology and biology, however, are difficult to capture with statistical models. Stochastic simulation models offer an alternative, but they were hitherto associated with a major disadvantage: their likelihood functions can usually not be calculated explicitly, and thus it is difficult to couple them to well-established statistical theory such as maximum likelihood and Bayesian statistics. A number of new methods, among them Approximate Bayesian Computing and Pattern-Oriented Modelling, bypass this limitation. These methods share three main principles: aggregation of simulated and observed data via summary statistics, likelihood approximation based on the summary statistics, and efficient sampling. We discuss principles as well as advantages and caveats of these methods, and demonstrate their potential for integrating stochastic simulation models into a unified framework for statistical modelling.

[1]  P. Diggle,et al.  Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .

[2]  Peter A. Vanrolleghem,et al.  Uncertainty in the environmental modelling process - A framework and guidance , 2007, Environ. Model. Softw..

[3]  Masakado Kawata,et al.  Why is adaptation prevented at ecological margins? New insights from individual-based simulations. , 2010, Ecology letters.

[4]  Florian Jeltsch,et al.  Pattern-oriented modelling for estimating unknown pre-breeding survival rates: The case of the Lesser Spotted Woodpecker (Picoides minor) , 2007 .

[5]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[6]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[7]  Ron Smith,et al.  Bayesian calibration of process-based forest models: bridging the gap between models and data. , 2005, Tree physiology.

[8]  Aaron M. Ellison,et al.  Bayesian inference in ecology , 2004 .

[9]  A. Huth,et al.  The simulation of the movement of fish schools , 1992 .

[10]  Paul Fearnhead,et al.  Semi-automatic Approximate Bayesian Computation , 2010 .

[11]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[12]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[13]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[14]  Kevin R. Thornton,et al.  Approximate Bayesian Inference Reveals Evidence for a Recent, Severe Bottleneck in a Netherlands Population of Drosophila melanogaster , 2006, Genetics.

[15]  Damaris Zurell,et al.  The virtual ecologist approach: simulating data and observers , 2010 .

[16]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[17]  Christl A Donnelly,et al.  Robust parameter estimation techniques for stochastic within-host macroparasite models. , 2003, Journal of theoretical biology.

[18]  V. Grimm,et al.  More Realistic than Anticipated: A Classical Forest-Fire Model from Statistical Physics Captures Real Fire Shapes , 2008 .

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Keith Beven,et al.  Informal likelihood measures in model assessment: Theoretic development and investigation , 2008 .

[21]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[22]  Ollivier Hyrien,et al.  A Stochastic Model to Analyze Clonal Data on Multi‐Type Cell Populations , 2005, Biometrics.

[23]  Chris J. Topping,et al.  A pattern-oriented modelling approach to simulating populations of grey partridge , 2010 .

[24]  Boris Schröder,et al.  SIMULATING FOREST DYNAMICS OF A TROPICAL MONTANE FOREST IN SOUTH ECUADOR , 2009 .

[25]  D. McFadden A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration , 1989 .

[26]  Franck Jabot,et al.  A stochastic dispersal-limited trait-based model of community dynamics. , 2010, Journal of theoretical biology.

[27]  Geir Huse,et al.  Individual‒Based Models , 2008 .

[28]  E. David Ford,et al.  The use of multi-criteria assessment in developing a process model , 2006 .

[29]  Raphaël Duboz,et al.  Application of an evolutionary algorithm to the inverse parameter estimation of an individual-based model , 2010 .

[30]  Michael P. H. Stumpf,et al.  Simulation-based model selection for dynamical systems in systems and population biology , 2009, Bioinform..

[31]  Steven F. Railsback,et al.  Individual-based modeling and ecology , 2005 .

[32]  Richard G. Everitt,et al.  Likelihood-free estimation of model evidence , 2011 .

[33]  Christopher K. Wikle,et al.  Hierarchical Bayesian Models for Predicting The Spread of Ecological Processes , 2003 .

[34]  J. Lopes,et al.  The use of approximate Bayesian computation in conservation genetics and its application in a case study on yellow-eyed penguins , 2010, Conservation Genetics.

[35]  Andreas Huth,et al.  Simulation of the growth of a lowland Dipterocarp rain forest with FORMIX3 , 2000 .

[36]  Eckart Winkler,et al.  Spread of an ant-dispersed annual herb: An individual-based simulation study on population development of Melampyrum pratense L. , 2007 .

[37]  Daniel Wegmann,et al.  Bayesian Computation and Model Selection Without Likelihoods , 2010, Genetics.

[38]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[39]  Mike S. Fowler,et al.  When can we distinguish between neutral and non-neutral processes in community dynamics under ecological drift? , 2009, Ecology letters.

[40]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[41]  M. Plummer Penalized loss functions for Bayesian model comparison. , 2008, Biostatistics.

[42]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[43]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[44]  Michael R. J. Forstner,et al.  Projecting population trends of endangered amphibian species in the face of uncertainty: A pattern-oriented approach , 2009 .

[45]  Wenxin Jiang,et al.  The Indirect Method: Inference Based on Intermediate Statistics—A Synthesis and Examples , 2004 .

[46]  Ran Nathan,et al.  FIELD VALIDATION AND SENSITIVITY ANALYSIS OF A MECHANISTIC MODEL FOR TREE SEED DISPERSAL BY WIND , 2001 .

[47]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[48]  W. Li,et al.  Estimating the age of the common ancestor of a sample of DNA sequences. , 1997, Molecular biology and evolution.

[49]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[50]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[51]  M. Beaumont Approximate Bayesian Computation in Evolution and Ecology , 2010 .

[52]  Uta Berger,et al.  Pattern-Oriented Modeling of Agent-Based Complex Systems: Lessons from Ecology , 2005, Science.

[53]  Mark A. Beaumont,et al.  Approximate Bayesian Computation Without Summary Statistics: The Case of Admixture , 2009, Genetics.

[54]  Thorsten Wiegand,et al.  Dealing with Uncertainty in Spatially Explicit Population Models , 2004, Biodiversity & Conservation.

[55]  B. Carlin,et al.  Diagnostics: A Comparative Review , 2022 .

[56]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[57]  Y. Bar-Yam,et al.  Global patterns of speciation and diversity , 2009, Nature.

[58]  M. Beaumont,et al.  Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model , 2010, Genetics.

[59]  K. Heggland,et al.  Estimating functions in indirect inference , 2004 .

[60]  I. Couzin,et al.  Effective leadership and decision-making in animal groups on the move , 2005, Nature.

[61]  Rampal S Etienne,et al.  The implicit assumption of symmetry and the species abundance distribution. , 2007, Ecology letters.

[62]  Carsten Wiuf,et al.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum , 2007, PLoS Comput. Biol..

[63]  S. Levin,et al.  Comparing Classical Community Models: Theoretical Consequences for Patterns of Diversity , 2002, The American Naturalist.

[64]  Elizabeth E. Holmes,et al.  BEYOND THEORY TO APPLICATION AND EVALUATION: DIFFUSION APPROXIMATIONS FOR POPULATION VIABILITY ANALYSIS , 2004 .

[65]  A. Pettitt,et al.  Approximate Bayesian computation using indirect inference , 2011 .

[66]  Y. Pawitan In all likelihood : statistical modelling and inference using likelihood , 2002 .

[67]  K. Beven,et al.  A limits of acceptability approach to model evaluation and uncertainty estimation in flood frequency estimation by continuous simulation: Skalka catchment, Czech Republic , 2009 .

[68]  Nicolas E. Humphries,et al.  Scaling laws of marine predator search behaviour , 2008, Nature.

[69]  Franck Jabot,et al.  Inferring the parameters of the neutral theory of biodiversity using phylogenetic information and implications for tropical forests. , 2009, Ecology letters.

[70]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[71]  Jean-Marie Cornuet,et al.  Lack of confidence in ABC model choice , 2011, 1102.4432.

[72]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[73]  M. Beaumont Estimation of population growth or decline in genetically monitored populations. , 2003, Genetics.

[74]  Thorsten Wiegand,et al.  Fragmented landscapes, road mortality and patch connectivity: modelling influences on the dispersal of Eurasian lynx , 2004 .

[75]  S. Tavaré,et al.  Dating primate divergences through an integrated analysis of palaeontological and molecular data. , 2011, Systematic biology.

[76]  D. Balding,et al.  Analyses of infectious disease data from household outbreaks by Markov chain Monte Carlo methods , 2000 .

[77]  Boris Schröder,et al.  Analysis of pattern–process interactions based on landscape models—Overview, general concepts, and methodological issues , 2006 .

[78]  David A Siegel,et al.  Turbulent dispersal promotes species coexistence , 2010, Ecology letters.

[79]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[80]  Nozer D. Singpurwalla,et al.  Non-informative priors do not exist A dialogue with José M. Bernardo , 1997 .

[81]  D. Wilkinson Stochastic modelling for quantitative description of heterogeneous biological systems , 2009, Nature Reviews Genetics.

[82]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Didier Concordet,et al.  A simulated pseudo-maximum likelihood estimator for nonlinear mixed models , 2002 .

[84]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[85]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[86]  Darren J. Wilkinson,et al.  Bayesian Emulation and Calibration of a Stochastic Computer Model of Mitochondrial DNA Deletions in Substantia Nigra Neurons , 2009 .

[87]  Guy Laroque,et al.  Simulation-Based Estimation of Models with Lagged Latent Variables , 1993 .

[88]  E. Batllori,et al.  Disentangling the Formation of Contrasting Tree-Line Physiognomies Combining Model Selection and Bayesian Parameterization for Simulation Models , 2011, The American Naturalist.

[89]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[90]  G. Bertorelle,et al.  ABC as a flexible framework to estimate demography over space and time: some cons, many pros , 2010, Molecular ecology.

[91]  Jim Freer,et al.  Towards a limits of acceptability approach to the calibration of hydrological models : Extending observation error , 2009 .

[92]  Junbin Gao,et al.  Simulated maximum likelihood method for estimating kinetic rates in gene expression , 2007, Bioinform..

[93]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[94]  Volker Grimm,et al.  Using pattern-oriented modeling for revealing hidden information: a key for reconciling ecological theory and application , 2003 .

[95]  James S. Clark,et al.  A future for models and data in environmental science. , 2006, Trends in ecology & evolution.

[96]  Ella Vázquez-Domínguez,et al.  The tropics: cradle, museum or casino? A dynamic null model for latitudinal gradients of species diversity. , 2008, Ecology letters.

[97]  O. Ovaskainen,et al.  State-space models of individual animal movement. , 2008, Trends in ecology & evolution.

[98]  Thorsten Wiegand,et al.  Expansion of Brown Bears (Ursus arctos) into the Eastern Alps: A Spatially Explicit Population Model , 2004, Biodiversity & Conservation.

[99]  Keith Beven,et al.  A manifesto for the equifinality thesis , 2006 .

[100]  Noah A. Rosenberg,et al.  Demographic History of European Populations of Arabidopsis thaliana , 2008, PLoS genetics.