The Comparison and Evaluation of Forecasters.

In this paper we present methods for comparing and evaluating forecasters whose predictions are presented as their subjective probability distributions of various random variables that will be observed in the future, e.g. weather forecasters who each day must specify their own probabilities that it will rain in a particular location. We begin by reviewing the concepts of calibration and refinement, and describiing the relationship between this notion of refinement and the notion of sufficiency in the comparison of statistical experiments. We also consider the question of interrelationships among forecasters and discuss methods by which an observer should combine the predictions from two or more different forecasters. Then we turn our attention to the concept of a proper scoring rule for evaluating forecasters, relating it to the concepts of calibration and refinement. Finally, we discuss conditions under which one fore- caster can exploit the predictions of another forecaster to obtain a better score. In this paper we describe some concepts and methods appropriate for evaluating and com- paring forecasters who repeatedly present their predictions of whether or not various events will occur in terms of their subjective probabilities of those events. The ideas we describe here are relevant in almost any situation in which forecasters must repeatedly make such probabilistic predictions, regardless of the particular subject matter or substantive area of the events being forecast. The forecaster might be an economist who at the beginning of each quarterly period makes predictions about unemployment, the rate of inflation, or Gross National Product in that quarter based on the values of various economic indicators; the forecaster might even make predictions using a large-scale econometric model of the United States economy based on hundreds of variables and econometric relations. In a different field, the forecaster might be the weatherman for a television station who at the beginning of each day must announce his probability that it will rain during the day. For ease of exposition, we present our discussions here in the context of such a weather forecaster who day after day must specify his subjective probability x that there will be at least a certain amount of rain at some given location during a specified time interval of the day. We refer to the occurrence of this well-specified event simply as "rain". Thus, at the begin- ning of each day the forecaster must specify his probability of rain and at the end of each day he observes whether or not rain actually occurred. The probability x specified by the forecaster on any particular day is called his prediction for that day. We shall make the realistic, and simultaneously simplifying, assumption that the