Sensitivity of Verification Scores to the Classification of the Predictand

Abstract In the practice of forecast verification, the results of applying scoring rules appear to depend on the way the predictand is classified. This paper contains an examination of the sensitivity of six scoring rules to the classification. The approach is purely theoretical, in a sense that a Gaussian model for both forecasts and observations is designed. Scoring results for this model an calculated for different scoring rules and different classifications. The results appear to favor the Ranked Probability Score (RPS), which is almost insensitive to the classification. Further, categorical scoring rules show a better performance in this respect than probabilistic scoring rules, except for the RPS. The use of the other three scoring rules (for probability forecasts) should not be recommended for the verification of forecasts of ordered predictands; that is, in case the classification involves more than two classes.