Evaluation of CMIP5 and CMIP6 simulations of historical surface air temperature extremes using proper evaluation methods

Reliable projections of extremes by climate models are becoming increasingly important in the context of climate change and associated societal impacts. Extremes are by definition rare events, characterized by a small sample associated with large uncertainties. The evaluation of extreme events in model simulations thus requires performance measures that compare full distributions rather than simple summaries. This paper proposes the use of the integrated quadratic distance (IQD) for this purpose. The IQD is applied to evaluate CMIP5 and CMIP6 simulations of monthly maximum and minimum near-surface air temperature over Europe and North America against both observation-based data and reanalyses. Several climate models perform well to the extent that these models’ performance is competitive with the performance of another data product in simulating the evaluation set. While the model rankings vary with region, season and index, the model evaluation is robust against changes in the grid resolution considered in the analysis. When the model simulations are ranked based on their similarity with the ERA5 reanalysis, more CMIP6 than CMIP5 models appear at the top of the ranking. When evaluated against the HadEX2 data product, the overall performance of the two model ensembles is similar.

[1]  Ipcc Global Warming of 1.5°C , 2022 .

[2]  R. Allan,et al.  Development of an Updated Global Land In Situ‐Based Data Set of Temperature and Precipitation Extremes: HadEX3 , 2020, Journal of Geophysical Research: Atmospheres.

[3]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[4]  Shaochun Huang,et al.  New Approach for Bias Correction and Stochastic Downscaling of Future Projections for Daily Mean Temperatures to a High-Resolution Grid , 2019 .

[5]  J. Sillmann,et al.  Economic Losses of Heat-Induced Reductions in Outdoor Worker Productivity: a Case Study of Europe , 2019, Economics of Disasters and Climate Change.

[6]  C. Lussana,et al.  seNorge2 daily precipitation, an observational gridded dataset over Norway from 1957 to the present day , 2018 .

[7]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[8]  Thordis L. Thorarinsdottir,et al.  Verification: Assessment of Calibration and Accuracy , 2018 .

[9]  T. Shepherd,et al.  Towards process-informed bias correction of climate change simulations , 2017 .

[10]  F. Zwiers,et al.  The impact of ENSO and the NAO on extreme winter precipitation in North America in observations and regional climate models , 2017, Climate Dynamics.

[11]  Patrick C. Taylor,et al.  A Framework for Evaluating Climate Model Performance Metrics , 2016 .

[12]  J. Peters,et al.  Quantifying changes in climate variability and extremes: Pitfalls and their overcoming , 2015 .

[13]  Francesco Ravazzolo,et al.  Forecaster's Dilemma: Extreme Events and Forecast Evaluation , 2015, 1512.09244.

[14]  P. Friederichs,et al.  Multivariate—Intervariable, Spatial, and Temporal—Bias Correction* , 2015 .

[15]  Y. Dibike,et al.  Inter‐comparison of high‐resolution gridded climate data sets and their implication on hydrological model simulation over the Athabasca Watershed, Canada , 2014 .

[16]  Francis W. Zwiers,et al.  Consistency of Temperature and Precipitation Extremes across Various Global Gridded In Situ and Reanalysis Datasets , 2014 .

[17]  Reto Knutti,et al.  Climate model genealogy: Generation CMIP5 and how we got there , 2013 .

[18]  F. Zwiers,et al.  Climate extremes indices in the CMIP5 multimodel ensemble: Part 2. Future climate projections , 2013 .

[19]  Anuj Srivastava,et al.  Updated analyses of temperature and precipitation extreme indices since the beginning of the twentieth century: The HadEX2 dataset , 2013 .

[20]  F. Zwiers,et al.  Climate extremes indices in the CMIP5 multimodel ensemble: Part 1. Model evaluation in the present climate , 2013 .

[21]  Nadine Gissibl,et al.  Using Proper Divergence Functions to Evaluate Climate Models , 2013, SIAM/ASA J. Uncertain. Quantification.

[22]  Thordis L. Thorarinsdottir,et al.  Multivariate probabilistic forecasting using ensemble Bayesian model averaging and copulas , 2012, 1202.3956.

[23]  T. Palmer,et al.  Towards the probabilistic Earth‐system simulator: a vision for the future of climate and weather prediction , 2012 .

[24]  Karl E. Taylor,et al.  An overview of CMIP5 and the experiment design , 2012 .

[25]  M. Hutchinson,et al.  Customized spatial climate models for North America , 2011 .

[26]  G. Hegerl,et al.  Indices for monitoring changes in extremes based on daily temperature and precipitation data , 2011 .

[27]  J. Thepaut,et al.  The ERA‐Interim reanalysis: configuration and performance of the data assimilation system , 2011 .

[28]  P. Guttorp,et al.  Evaluation of a dynamic downscaling of precipitation over the Norwegian mainland , 2011 .

[29]  Olivier Armantier,et al.  Eliciting beliefs: Proper scoring rules, incentives, stakes and hedging , 2010 .

[30]  Reto Knutti,et al.  The use of the multi-model ensemble in probabilistic climate projections , 2007, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[31]  Leonard A. Smith,et al.  Scoring Probabilistic Forecasts: The Importance of Being Proper , 2007 .

[32]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[33]  G. Meehl,et al.  An intercomparison of model-simulated historical and future changes in extreme events , 2007 .

[34]  G. Meehl,et al.  OVERVIEW OF THE COUPLED MODEL INTERCOMPARISON PROJECT , 2005 .

[35]  F. Joos,et al.  Probabilistic climate change projections using neural networks , 2003 .

[36]  Eric A. Rosenberg,et al.  A Long-Term Hydrologically Based Dataset of Land Surface Fluxes and States for the Conterminous United States: Update and Extensions* , 2002 .

[37]  John S. Woollen,et al.  NCEP-DOE AMIP-II reanalysis (R-2). Bulletin of the American Meteorological Society . , 2002 .

[38]  H. Hersbach Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems , 2000 .

[39]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[40]  A. H. Murphy,et al.  “Good” Probability Assessors , 1968 .