Parameter identifiability, constraint, and equifinality in data assimilation with ecosystem models.

One of the most desirable goals of scientific endeavor is to discover laws or principles behind ‘‘mystified’’ phenomena. A cherished example is the discovery of the law of universal gravitation by Isaac Newton, which can precisely describe falling of an apple from a tree and predict the existence of Neptune. Scientists pursue mechanistic understanding of natural phenomena in an attempt to develop relatively simple equations with a small number of parameters to describe patterns in nature and to predict changes in the future. In this context, uncertainty had been considered to be incompatible with science (Klir 2006). Not until the early 20th century was the notion gradually changed when physicists studied the behavior of matter and energy on the scale of atoms and subatomic particles in quantum mechanics. In 1927, Heisenberg observed that the electron could not be considered as in an exact location, but rather in points of probable location in its orbital, which can be described by a probability distribution (Heisenberg 1958). Quantum mechanics lets scientists realize that inherent uncertainty exists in nature and is an unavoidable and essential property of most systems. Since then, scientists have developed methods to analyze and describe uncertainty. Ecosystem ecologists have recently directed attention to studying uncertainty in ecosystem processes. The Bayesian paradigm allows ecologists to generate a posteriori probabilistic density functions (PDF) for parameters of ecosystem models by assimilating a priori PDFs and measurements (Dowd and Meyer 2003). Xu et al. (2006), for example, evaluated uncertainty in parameter estimation and projected carbon sinks by a Bayesian framework using six data sets and a terrestrial ecosystem (TECO) model. The Bayesian framework has been applied to assimilation of eddy-flux data into simplified photosynthesis and evapotranspiration model (SIPNET) to evaluate information content of the net ecosystem exchange (NEE) observations for constraints of process parameters (e.g., Braswell et al. 2005) and to partition NEE into its component fluxes (Sacks et al. 2006). Verstraeten et al. (2008) evaluate error propagation and uncertainty of evaporation, soil moisture content, and net ecosystem productivity with remotely sensed data assimilation. Nevertheless, uncertainty in data assimilation with ecosystem models has not been systematically explored. Cressie et al. (2009) proposed a general framework to account for multiple sources of uncertainty in measurements, in sampling, in specification of the process, in parameters, and in initial and boundary conditions. They proposed to separate the multiple sources of uncertainty using a conditional-probabilistic approach. With this approach, ecologists need to build a hierarchical statistical model based on the Bayesian theorem, and to use Markov chain Monte Carlos (MCMC) techniques for sampling before probability distributions of interested parameters or projected state variables can be obtained for quantification of uncertainty. It is an elegant framework for quantifying uncertainties in the parameters and processes of ecological models. At the core of uncertainty analysis is parameter identifiability. When parameters can be constrained by a set of data with a given model structure, we can identify maximum likelihood values of the parameters and then those parameters are identifiable. Conversely, there is an issue of equifinality in data assimilation (Beven 2006) that different models, or different parameter values of the same model, may fit data equally well without the ability to distinguish which models or parameter values are better than others. Thus, the issue of identifiability is reflected by parameter constraint and equifinality. This essay first reviews the current status of our knowledge on parameter identifiability and then discusses major factors that influence it. To enrich discussion, we use examples in ecosystem ecology that are different from the one on population dynamics of harbor seals in Cressie et al. (2009).

[1]  Aaron M. Ellison,et al.  A Primer of Ecological Statistics , 2004 .

[2]  M. J. Bayarri,et al.  P Values for Composite Null Models , 2000 .

[3]  R. Monson,et al.  Model‐data synthesis of diurnal and seasonal CO2 fluxes at Niwot Ridge, Colorado , 2006 .

[4]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[5]  Brian Dennis,et al.  Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. , 2007, Ecology letters.

[6]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[7]  FRED S. GUTHERY,et al.  INVITED PAPER: INFORMATION THEORY IN WILDLIFE SCIENCE: CRITIQUE AND VIEWPOINT , 2005 .

[8]  A. Jansen Bayesian Methods for Ecology , 2009 .

[9]  Robert P Freckleton,et al.  Why do we still use stepwise modelling in ecology and behaviour? , 2006, The Journal of animal ecology.

[10]  V. Vieland,et al.  Statistical Evidence: A Likelihood Paradigm , 1998 .

[11]  Dennis L. Jackson Revisiting Sample Size and Number of Parameter Estimates: Some Support for the N:q Hypothesis , 2003 .

[12]  B. Efron Bayesians, Frequentists, and Scientists , 2005 .

[13]  J. Berger The case for objective Bayesian analysis , 2006 .

[14]  E L Ionides,et al.  Inference for nonlinear dynamical systems , 2006, Proceedings of the National Academy of Sciences.

[15]  N Thompson Hobbs,et al.  Alternatives to statistical hypothesis testing in ecology: a guide to self teaching. , 2006, Ecological applications : a publication of the Ecological Society of America.

[16]  Ernst Linder,et al.  Estimating diurnal to annual ecosystem parameters by synthesis of a carbon flux model with eddy covariance net ecosystem exchange observations , 2005 .

[17]  B. Efron Why Isn't Everyone a Bayesian? , 1986 .

[18]  Catherine A Calder,et al.  Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. , 2009, Ecological applications : a publication of the Ecological Society of America.

[19]  Robert Clement,et al.  On the validation of models of forest CO2 exchange using eddy covariance data: some perils and pitfalls. , 2005, Tree physiology.

[20]  M. Snover,et al.  Comments on “Using Bayesian state-space modelling to assess the recovery and harvest potential of the Hawaiian green sea turtle stock” , 2008 .

[21]  P. Valpine Monte Carlo State-Space Likelihoods by Weighted Posterior Kernel Density Estimation , 2004 .

[22]  S. Hurlbert Pseudoreplication and the Design of Ecological Field Experiments , 1984 .

[23]  Richard T. Conant,et al.  BEST PRACTICES IN PREDICTION FOR DECISION‐MAKING: LESSONS FROM THE ATMOSPHERIC AND EARTH SCIENCES , 2003 .

[24]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[25]  James S. Clark,et al.  Why environmental scientists are becoming Bayesians , 2004 .

[26]  D. Mayo,et al.  Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction , 2006, The British Journal for the Philosophy of Science.

[27]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[28]  S. Raghu,et al.  The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations , 2005 .

[29]  Improved Estimation of Normalizing Constants From Markov Chain Monte Carlo Output , 2008 .

[30]  George G. Woodworth,et al.  Biostatistics: A Bayesian Introduction , 2004 .

[31]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[32]  D. Haydon Models for Ecological Data: An Introduction , 2008 .

[33]  L. Prior,et al.  Ecological Models and Data in R , 2011 .

[34]  Don Edwards,et al.  Comment: The First Data Analysis Should be Journalistic , 1996 .

[35]  Dennis D. Baldocchi,et al.  Estimating parameters in a land‐surface model by applying nonlinear inversion to eddy covariance flux measurements from eight FLUXNET sites , 2007 .

[36]  B. Law,et al.  An improved analysis of forest carbon dynamics using data assimilation , 2005 .

[37]  Josep G. Canadell,et al.  Sustainability of terrestrial carbon sequestration: A case study in Duke Forest with inversion approach , 2003 .

[38]  B. Efron Empirical Bayes Methods for Combining Likelihoods , 1996 .

[39]  Frank Veroustraete,et al.  On uncertainties in carbon flux modelling and remotely sensed data assimilation: The Brasschaat pixel case , 2008 .

[40]  Perry de Valpine,et al.  BETTER INFERENCES FROM POPULATION-DYNAMICS EXPERIMENTS USING MONTE CARLO STATE-SPACE LIKELIHOOD METHODS , 2003 .

[41]  Toshinori Okuyama,et al.  COMBINING GENETIC AND ECOLOGICAL DATA TO ESTIMATE SEA TURTLE ORIGINS , 2005 .

[42]  R. Freckleton,et al.  The Ecological Detective: Confronting Models with Data , 1999 .

[43]  Michael Dowd,et al.  A Bayesian approach to the ecosystem inverse problem , 2003 .

[44]  James M. Robins,et al.  Asymptotic Distribution of P Values in Composite Null Models , 2000 .

[45]  W. Knorr,et al.  Inversion of terrestrial ecosystem model parameter values against eddy covariance measurements by Monte Carlo sampling , 2005 .

[46]  R. Hilborn,et al.  State-space likelihoods for nonlinear fisheries time-series , 2005 .

[47]  S. Carpenter,et al.  Ecological forecasts: an emerging imperative. , 2001, Science.

[48]  Mark Von Tress,et al.  Generalized, Linear, and Mixed Models , 2003, Technometrics.

[49]  Brian Dennis,et al.  Discussion: Should Ecologists Become Bayesians? , 1996 .

[50]  A. O'Hagan,et al.  Kendall's Advanced Theory of Statistics, Vol. 2b: Bayesian Inference. , 1996 .

[51]  James S. Clark,et al.  Hierarchical Modelling for the Environmental Sciences: Statistical Methods and Applications , 2006 .

[52]  Peter A. Coppin,et al.  Parameter estimation in surface exchange models using nonlinear inversion: how many parameters can we estimate and which measurements are most useful? , 2001 .

[53]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[54]  Keith Beven,et al.  A manifesto for the equifinality thesis , 2006 .

[55]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[56]  Lauri Oksanen,et al.  Logic of experiments in ecology: is pseudoreplication a pseudoissue? , 2001 .

[57]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[58]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[59]  Benjamin M. Bolker,et al.  Ecological Models and Data in R , 2008 .

[60]  Jay M. Ver Hoef,et al.  A Bayesian hierarchical model for monitoring harbor seal changes in Prince William Sound, Alaska , 2003, Environmental and Ecological Statistics.