On The Calibration of Probability Judgments: Some Critical Comments and Alternative Perspectives

Calibration of probability judgments has attracted in recent years an increasing number of researchers as re ected by an expanding number of articles in the literature on judgment and decision making. The underlying fundamental question that stimulated this line of research concerns the standards by which probability judgments could (or should) be assessed and evaluated. The most common (though certainly not exclusive) accepted criterion is what has been termed `calibration', the roots of which can be traced in the well-known Brier score (Brier, 1950) and subsequent modi®cations (e.g. Murphy, 1973; Yates, 1982, 1988). Two main criteria that evolved from this line of research are customarily referred to as calibration and resolution. Calibration (or reliability) supposedly measures the accuracy of probability judgments whereas resolution measures the diagnosticity (or discriminability) of these judgments. The two major substantive and pervasive ®ndings (e.g. Lichtenstein, Fischho€, and Phillips, 1982; Keren, 1991) are overcon®dence and the interaction between the amount of overcon®dence and diculty of the task, the so-called hard±easy e€ect. Several problems have been raised with regard to research on calibration, and in this commentary l would like to focus on three of them. First, calibration studies assume (implicitly or explicitly) that probabilities are subjective (e.g. Lichtenstein, Fischho€, and Phillips, 1982) yet evaluate them by a frequentistic criterion (Gigerenzer, 1991; Keren, 1991). The validity of such a procedure remains controversial. A second problem concerns the possible tradeo€ between calibration and resolution. Yates (1982) noted that calibration and resolution are not completely independent of each other, and Keren (1991) claimed that the requirements for maximizing calibration (i.e. minimizing the discrepancies between probability judgments and the corresponding reality) and achieving high resolution may often be incompatible. A similar point has been recently made by Yaniv and Foster (1995), who studied the evaluation of interval judgments. A third problem concerns the analysis and interpretation of calibration studies. Speci®cally, Erev, Wallsten, and Budescu (1994) have eloquently described the importance of regression toward the mean in interpreting calibration studies. Similar conclusions have been reached independently by Pfeifer (1994). In a nutshell, the contribution of the papers by Erev et al. and Pfeifer is in pointing out that both overcon®dence and the hard±easy e€ect may, at least to some degree, be an artifact due to regression toward the mean. In re ecting on the articles in this special volume, I will focus on these three issues and examine how they are treated by the di€erent authors. I will end this commentary by raising the question of what has been learned from thirty years of research on calibration of probabilities, and will o€er a brief (and somewhat skeptical) answer to the question.

[1]  W. Ferrell,et al.  The Hard-Easy Effect in Subjective Probability Calibration , 1996 .

[2]  D Kahneman,et al.  On the reality of cognitive illusions. , 1996, Psychological review.

[3]  James Ramirez,et al.  Good probabilistic forecasters: The ‘consumer's’ perspective , 1996 .

[4]  C. Varey,et al.  Towards a Consensus on Overconfidence , 1996 .

[5]  Lyle Brenner,et al.  Overconfidence in Probability and Frequency Judgments: A Critical Examination , 1996 .

[6]  J. Yates,et al.  Beliefs about Overconfidence, Including Its Cross-National Variation , 1996 .

[7]  Jack B. Soll Determinants of Overconfidence and Miscalibration: The Roles of Random Error and Ecological Structure☆ , 1996 .

[8]  Dean P. Foster,et al.  Graininess of judgment under uncertainty: An accuracy-informativeness trade-off , 1995 .

[9]  I. Erev,et al.  Simultaneous Over- and Underconfidence: The Role of Error in Judgment Processes. , 1994 .

[10]  J. Baranski,et al.  The calibration and resolution of confidence in perceptual judgments , 1994, Perception & psychophysics.

[11]  M. Björkman Internal Cue Theory: Calibration and Resolution of Confidence in General Knowledge , 1994 .

[12]  Phillip E. Pfeifer,et al.  Are We Overconfident in the Belief That Probability Forecasters Are Overconfident , 1994 .

[13]  P. Juslin The Overconfidence Phenomenon as a Consequence of Informal Experimenter-Guided Selection of Almanac Items , 1994 .

[14]  A Koriat,et al.  How do we know that we know? The accessibility model of the feeling of knowing. , 1993, Psychological review.

[15]  P. Juslin An explanation of the hard-easy effect in studies of realism of confidence in one's general knowledge , 1993 .

[16]  P. Juslin,et al.  Realism of confidence in sensory discrimination: The underconfidence phenomenon , 1993, Perception & psychophysics.

[17]  A. Tversky,et al.  The weighing of evidence and the determinants of confidence , 1992, Cognitive Psychology.

[18]  G. Keren Calibration and probability judgements: Conceptual and methodological issues , 1991 .

[19]  G. Gigerenzer,et al.  Probabilistic mental models: a Brunswikian theory of confidence. , 1991, Psychological review.

[20]  G. Gigerenzer How to Make Cognitive Illusions Disappear: Beyond “Heuristics and Biases” , 1991 .

[21]  J. Frank Yates,et al.  Analyzing the accuracy of probability judgments for multiple events: An extension of the covariance decomposition , 1988 .

[22]  G. Keren,et al.  On the ability of monitoring non-veridical perceptions and uncertain knowledge: some calibration studies. , 1988, Acta psychologica.

[23]  Dane K. Peterson,et al.  Confidence, uncertainty, and the use of information , 1988 .

[24]  Elisha Y. Babad,et al.  Wishful thinking and objectivity among sports fans. , 1987 .

[25]  David L. Ronis,et al.  Components of probability judgment accuracy: Individual consistency and effects of subject matter and assessment method. , 1987 .

[26]  Gideon Keren,et al.  Facing uncertainty in the game of bridge: A calibration study , 1987 .

[27]  Robin M. Hogarth,et al.  Generalization in Decision Research: The Role of Formal Models , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Amnon Rapoport,et al.  Measuring the Vague Meanings of Probability Terms , 1986 .

[29]  J. Harrison,et al.  Decision Making and Postdecision Surprises. , 1984 .

[30]  G. Brier,et al.  External correspondence: Decompositions of the mean probability score , 1982 .

[31]  B. Fischhoff,et al.  Calibration of probabilities: the state of the art to 1980 , 1982 .

[32]  Jay J.J. Christensen-Szalanski,et al.  Physicians' use of probabilistic information in a real clinical setting. , 1981 .

[33]  B. Fischhoff,et al.  Reasons for confidence. , 1980 .

[34]  B. Fischhoff,et al.  Knowing with Certainty: The Appropriateness of Extreme Confidence. , 1977 .

[35]  A. Tversky Features of Similarity , 1977 .

[36]  Louis Guttman,et al.  What Is Not What in Statistics , 1977 .

[37]  T. Fine,et al.  The Emergence of Probability , 1976 .

[38]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[39]  Howard B. Lee,et al.  Foundations of Behavioral Research , 1965 .

[40]  J SWETS,et al.  Decision processes in perception. , 1961, Psychological review.

[41]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[42]  Typical Laws of Heredity , 1877, Nature.