An evaluation of the statistical methods for testing the performance of crop models with observed data

Calibration and evaluation are two important steps prior to the application of a crop simulation model. The objective of this paper was to review common statistical methods that are being used for crop model calibration and evaluation. A group of deviation statistics were reviewed, including root mean squired error (RMSE), normalize-RMSE (nRMSE), mean absolute error (MAE), mean error (E), paired-t, index of agreement (d), modified index of agreement (d1), revised index of agreement (d1′), modeling efficiency (EF) and revised modeling efficiency (EF1). A case study of the statistical evaluation was conducted for the DSSAT Cropping System Model (CSM) using 10 experimental datasets for maize, peanut, soybean, wheat and potato from Brazil, China, Ghana, and the USA. The results indicated that R2 was not a good statistic for model evaluation because it is insensitive to regression coefficients (α and β) of the linear model y=α+βx+e. However, linear regression can be used for model evaluation (test H0: α=0, β=1) if auto-correlation, normality and heteroskedasticaity of the error term (e) are tested or the proper data transfers are made. The results also illustrated that statistical evaluation of total dataset across treatments might be insufficient. Hence the evaluation of each treatment is necessary to make the right conclusion, especially when evaluating soil water content under different planting date treatments and soil mineral N under different N treatments. Co-variability analysis among dimensionless statistics (d, d1, d1′, EF and EF1) recommended that d and EF are inflated by the sum of squares-based deviations, i.e., the larger deviations contribute more weight on the statistic than the smaller deviation due to the squared term. However, EF had a larger range and a clear physical meaning at EF=0, making it superior to d. Values of d=0.75 were obtained from regression with all positive values of EF (EF⩾0), indicating that values of d⩾0.75 and EF⩾0 should be the minimum values for plant growth evaluation. Values of d⩾0.60 and EF⩾−1.0 should be the minimum values for soil outputs evaluation combined with t-test due to the fact that the soil parameters in the DSSAT SOIL module are difficult to calibrate compared with plant growth parameters because of no sufficient observed soil dataset. Due to the statistical nature, no single statistic is more robust over others but some statistics are highly correlated. Therefore, several statistics may be used from each of the following correlated groups (RMSE, MAE), (E, t-test), (d, d1, d1′) and (EF, EF1) in one assessment of model evaluation so that a representative statistical conclusion can be obtained with respect to model performance.

[1]  Gordon B. Stenhouse,et al.  Temporal autocorrelation functions for movement rates from global positioning system radiotelemetry data , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  Richard H. McCuen,et al.  A proposed index for comparing hydrographs , 1975 .

[3]  Leif T. Jensen,et al.  A comparison of the performance of nine soil organic matter models using datasets from seven long-term experiments , 1997 .

[4]  Andrew P. Whitmore,et al.  Computer simulation of changes in soil mineral nitrogen and crop nitrogen during autumn, winter and spring , 1987, The Journal of Agricultural Science.

[5]  M. Salam,et al.  Comparing Simulated and Measured Values Using Mean Squared Deviation and its Components , 2000 .

[6]  G. Hoogenboom,et al.  Impact of Water Stress on Maize Grown Off‐Season in a Subtropical Environment , 2007 .

[7]  Gerrit Hoogenboom,et al.  Modelling crop yield, soil water content and soil temperature for a soybean–maize rotation under conventional and conservation tillage systems in Northeast China , 2013 .

[8]  G. W. Snedecor STATISTICAL METHODS , 1967 .

[9]  Edward J. Rykiel,et al.  Testing ecological models: the meaning of validation , 1996 .

[10]  D. Ventrella,et al.  Comparison of nitrogen and irrigation strategies in tomato using CROPGRO model. A case study from Southern Italy , 2007 .

[11]  D. J. Greenwood,et al.  Statistical methods for evaluating a crop nitrogen simulation model, N_ABLE , 2000 .

[12]  D. Legates,et al.  Evaluating the use of “goodness‐of‐fit” Measures in hydrologic and hydroclimatic model validation , 1999 .

[13]  K. Loague,et al.  Statistical and graphical methods for evaluating solute transport models: Overview and application , 1991 .

[14]  Gerrit Hoogenboom,et al.  Application of the CSM-CERES-Maize model for planting date evaluation and yield forecasting for maize grown off-season in a subtropical environment , 2007 .

[15]  Robert E. Davis,et al.  Statistics for the evaluation and comparison of models , 1985 .

[16]  P. Zuidema,et al.  Autocorrelated growth of tropical forest trees: Unraveling patterns and quantifying consequences , 2006 .

[17]  Jim Hanan,et al.  Comparison of Crop Model Validation Methods , 2012 .

[18]  Jeffrey G. Arnold,et al.  Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations , 2007 .

[19]  S. Gayler,et al.  The impact of crop growth sub-model choice on simulated water and nitrogen balances , 2006, Nutrient Cycling in Agroecosystems.

[20]  C. Willmott ON THE VALIDATION OF MODELS , 1981 .

[21]  Thomas R. Sinclair,et al.  Criteria for publishing papers on crop modeling , 2000 .

[22]  C. Willmott,et al.  A refined index of model performance , 2012 .

[23]  E. Wendland,et al.  Error Autocorrelation and Linear Regression for Temperature‐Based Evapotranspiration Estimates Improvement 1 , 2012 .

[24]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[25]  R. Zentner,et al.  Evaluation of LEACHMN under dryland conditions. I. Simulation of water and solute transport , 2005 .

[26]  Jeffrey W. White,et al.  Decision Support System for Agrotechnology Transfer (DSSAT) Version 4.5 [CD-ROM] , 2012 .

[27]  J. Y. Yang,et al.  EasyGrapher: software for graphical and statistical validation of DSSAT outputs , 2004 .

[28]  James W. Jones,et al.  The DSSAT cropping system model , 2003 .

[29]  James W. Jones,et al.  Decision support system for agrotechnology transfer: DSSAT v3 , 1998 .

[30]  H. Riedwyl Goodness of Fit , 1967 .

[31]  P. Krause,et al.  COMPARISON OF DIFFERENT EFFICIENCY CRITERIA FOR HYDROLOGICAL MODEL ASSESSMENT , 2005 .

[32]  C. Willmott Some Comments on the Evaluation of Model Performance , 1982 .

[33]  G. Hoogenboom,et al.  Thermal time for phenological development of four maize hybrids grown off-season in a subtropical environment , 2005, The Journal of Agricultural Science.

[34]  Kenneth H. Reckhow,et al.  Statistical Evaluation of Mechanistic Water‐Quality Models , 1990 .