Finding the appropriate level of complexity for a simulation model: An example with a forest growth model

The topic of model complexity is fundamental to model developers and model users. In this study, we investigate how over- and under-fitting of a driving function in a simulation model influences the predictive ability of the model. Secondly, we investigate whether model selection approaches succeed in selecting driving functions with the best predictive ability. We address these issues through an example with the forest simulator SORTIE-ND. Utilizing maximum likelihood methods and individual tree growth data we parameterize five growth functions of increasing complexity. We then incorporate each growth function into the simulation model SORTIE-ND and test predicted growth against independent data. Compared to the independent data, the simplest and the most complex growth functions had the poorest predictive ability while functions of intermediate complexity had the best predictive ability. The poor predictive ability of the simplest model is caused by poor approximation of the system while the poor predictive ability of the most complex model is caused by biased parameter estimates. A growth function of intermediate complexity was the most parsimonious model where error due to approximation and error due to estimation were simultaneously minimized. The model selection criteria AIC and BIC were found to select complex functions that were over-fitted according to the independent data comparison. BIC was closer to choosing the model that minimized prediction error than AIC. In this example, BIC is the more appropriate model selection criterion. It is important that both model developers and models users remember that more complex models do not always result in better predictive models.

[1]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[2]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[3]  Hugh G. Gauch,et al.  Scientific method in practice , 2002 .

[4]  P. Young,et al.  Simplicity out of complexity in environmental modelling: Occam's razor revisited. , 1996 .

[5]  Ray J. Paul,et al.  On simulation model complexity , 2000, 2000 Winter Simulation Conference Proceedings (Cat. No.00CH37165).

[6]  Oscar García,et al.  Evaluating forest Growth Models , 1997 .

[7]  H. Bozdogan,et al.  Akaike's Information Criterion and Recent Developments in Information Complexity. , 2000, Journal of mathematical psychology.

[8]  Michael J. Papaik,et al.  Neighborhood analyses of canopy tree competition along environmental gradients in New England forests. , 2006, Ecological applications : a publication of the Ecological Society of America.

[9]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[10]  Stephen Ward,et al.  Arguments for Constructively Simple Models , 1989 .

[11]  T. C. Chamberlin The Method of Multiple Working Hypotheses , 1931, The Journal of Geology.

[12]  Wiktor L. Adamowicz,et al.  Towards sustainable management of the boreal forest. , 2003 .

[13]  Malcolm R. Forster,et al.  How to Tell When Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions , 1994, The British Journal for the Philosophy of Science.

[14]  C. Mitchell Dayton,et al.  Best Regression Model Using Information Criteria , 2002 .

[15]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[16]  H. Akaike INFORMATION THEORY AS AN EXTENSION OF THE MAXIMUM LIKELIHOOD , 1973 .

[17]  H. Burkhart,et al.  Suggestions for choosing an appropriate level for modelling forest stands. , 2003 .

[18]  C. Dayton,et al.  Detecting patterns of bivariate mean vectors using model‐selection criteria , 1995 .

[19]  Paula Soares,et al.  Modelling Forest Systems , 2003 .

[20]  T. C. Chamberlin The Method of Multiple Working Hypotheses: With this method the dangers of parental affection for a favorite theory can be circumvented. , 1965, Science.

[21]  Michael Pidd,et al.  Five simple principle of modelling , 1996, WSC.

[22]  M. Pidd Five simple principles of modelling , 1996, Proceedings Winter Simulation Conference.

[23]  C. Dayton INVITED ARTICLES Model Comparisons Using Information Measures , 2003 .

[24]  Roald Hoffmann,et al.  Ockham's Razor and Chemistry * , 1997 .

[25]  C. Canham,et al.  A neighborhood analysis of canopy tree competition : effects of shading versus crowding , 2004 .

[26]  J. Kuha AIC and BIC , 2004 .

[27]  Hugh G. Gauch,et al.  Prediction, Parsimony and Noise , 1993 .

[28]  John Harte,et al.  Toward a Synthesis of the Newtonian and Darwinian Worldviews , 2002 .

[29]  J. Pojar,et al.  A field guide to site identification and interpretation for the Prince Rupert Forest Region. Part 1 , 1993 .

[30]  A. I. McLeod,et al.  Parsimony, model adequacy and periodic correlation in time series forecasting , 1993, 1611.01535.

[31]  R. Hilborn,et al.  The Ecological Detective: Confronting Models with Data , 1997 .

[32]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[33]  William L. Goffe,et al.  SIMANN: FORTRAN module to perform Global Optimization of Statistical Functions with Simulated Annealing , 1992 .

[34]  C. Mitchell Dayton,et al.  Model Selection Information Criteria for Non-Nested Latent Class Models , 1997 .

[35]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[36]  J. P. Kimmins,et al.  Modelling tools to assess the sustainability of forest management scenarios. , 2003 .

[37]  M. Forster,et al.  Key Concepts in Model Selection: Performance and Generalizability. , 2000, Journal of mathematical psychology.

[38]  D Penny,et al.  Parsimony, likelihood, and the role of models in molecular phylogenetics. , 2000, Molecular biology and evolution.