Use of expert judgment in exposure assessment: Part 2. Calibration of expert judgments about personal exposures to benzene

The recent movement of regulatory agencies toward probabilistic analyses of human health and environmental risks has focused greater attention on the quality of the estimates of variability and uncertainty that underlie them. Of particular concern is how uncertainty — a measure of what is not known — is characterized, as uncertainty can play an influential role in analyses of the need for regulatory controls or in estimates of the economic value of additional research. This paper reports the second phase of a study, conducted as an element of the National Human Exposure Assessment Survey (NHEXAS), to obtain and calibrate exposure assessment experts judgments about uncertainty in residential ambient, residential indoor, and personal air benzene concentrations experienced by the nonsmoking, nonoccupationally exposed population in U.S. EPA's Region V. Subjective judgments (i.e., the median, interquartile range, and 90% confidence interval) about the means and 90th percentiles of each of the benzene distributions were elicited from the seven experts participating in the study. The calibration or quality of the experts' judgments was assessed by comparing them to the actual measurements from the NHEXAS Region V study using graphical techniques, a quadratic scoring rule, and surprise and interquartile indices. The results from both quantitative scoring methods suggested that, considered collectively, the experts' judgments were relatively well calibrated although on balance, underconfident. The calibration of individual expert judgments appeared variable, highlighting potential pitfalls in reliance on individual experts. In a surprising finding, the experts' judgments about the 90th percentiles of the benzene distributions were better calibrated than their predictions about the means; the experts tended to be overconfident in their ability to predict the means. This paper is also one of the first calibration studies to demonstrate the importance of taking into account intraexpert correlation on the statistical significance of the findings. When the judgments were assumed to be independent, analysis of the surprise and interquartile indices found evidence of poor calibration (P<0.05). However, when the intraexpert correlation in the study was taken into account, these findings were no longer statistically significant. The analysis further found that the experts' judgments scored better than estimates of Region V benzene concentrations simply drawn from earlier studies of ambient, indoor and personal benzene levels in other U.S. cities. These results suggest the value of careful elicitation of expert judgments in characterizing exposures in probabilistic form. Additional calibration studies need to be undertaken to corroborate and extend these findings.

[1]  B. Fischhoff,et al.  Assessing uncertainty in physical constants , 1986 .

[2]  Myron B Fiering,et al.  Statistical distributions of health risks , 1984 .

[3]  Baruch Fischhoff,et al.  Characterizing Mental Models of Hazardous Processes: A Methodology and an Application to Radon , 1992 .

[4]  Jelle van Lenthe,et al.  ELI: An Interactive Elicitation Technique for Subjective Probability Distributions , 1993 .

[5]  J. Evans,et al.  A distributional approach to characterizing low-dose cancer risk. , 1994, Risk analysis : an official publication of the Society for Risk Analysis.

[6]  A. I. Shlyakhter,et al.  Quantifying the credibility of energy projections from trends in past data: the US energy sector , 1994 .

[7]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[8]  A. H. Murphy,et al.  Reliability of Subjective Probability Forecasts of Precipitation and Temperature , 1977 .

[9]  P. R. Bevington,et al.  Data Reduction and Error Analysis for the Physical Sciences , 1969 .

[10]  J R Thornbury,et al.  A second look at the utility of radiographic skull examination for trauma. , 1979, AJR. American journal of roentgenology.

[11]  J. Evans,et al.  Use of expert judgment in exposure assessment. Part I. Characterization of personal exposure to benzene , 2001, Journal of Exposure Analysis and Environmental Epidemiology.

[12]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[13]  Robert L. Winkler,et al.  Evaluating and Combining Physicians' Probabilities of Survival in an Intensive Care Unit , 1993 .

[14]  Max Henrion,et al.  Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis , 1990 .

[15]  K Sexton,et al.  Informed decisions about protecting and promoting public health: rationale for a National Human Exposure Assessment Survey. , 1995, Journal of exposure analysis and environmental epidemiology.

[16]  A C Taylor,et al.  The value of animal test information in environmental control decisions. , 1993, Risk analysis : an official publication of the Society for Risk Analysis.

[17]  Jay J.J. Christensen-Szalanski,et al.  Physicians' use of probabilistic information in a real clinical setting. , 1981 .

[18]  H Ozkaynak,et al.  A population-based exposure model for benzene. , 1995, Journal of exposure analysis and environmental epidemiology.

[19]  John S. Evans,et al.  Subjective Estimation of Toluene Exposures: A Calibration Study of Industrial Hygienists , 1989 .

[20]  Kimberly M. Thompson,et al.  The Value of Improved National Exposure Information for Perchloroethylene (Perc): A Case Study for Dry Cleaners , 1997 .

[21]  D. A. Williams,et al.  The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. , 1975, Biometrics.

[22]  B Fischhoff,et al.  Evaluating risk communications: completing and correcting mental models of hazardous processes, Part II. , 1994, Risk analysis : an official publication of the Society for Risk Analysis.

[23]  A. H. Murphy,et al.  Scalar and Vector Partitions of the Probability Score: Part I. Two-State Situation , 1972 .

[24]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[25]  D W Cooper,et al.  On the propagation of error in air pollution measurements , 1984, Environmental monitoring and assessment.

[26]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[27]  F. A. Seiler,et al.  Error Propagation for Large Errors , 1987 .

[28]  D. Singer,et al.  Hospitalization decision in febrile intravenous drug users. , 1990, The American journal of medicine.

[29]  W M Tierney,et al.  Physicians' Estimates of the Probability of Myocardial Infarction in Emergency Boom Patients with chest Pain , 1986, Medical decision making : an international journal of the Society for Medical Decision Making.

[30]  Dean P. Foster,et al.  Precision and Accuracy of Judgmental Estimation , 1997 .

[31]  Williams Da,et al.  The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. , 1975 .

[32]  A. H. Murphy,et al.  Diagnostic verification of probability forecasts , 1992 .

[33]  L. Wallace,et al.  Environmental exposure to benzene: an update. , 1996, Environmental health perspectives.

[34]  J. S. Hunter,et al.  Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. , 1979 .

[35]  Theresa M. Mullin,et al.  Understanding and supporting the process of probabilistic estimation , 1986 .

[36]  R. L. Winkler,et al.  Scoring Rules for Continuous Probability Distributions , 1976 .

[37]  L. Beach,et al.  The citation bias: Fad and fashion in the judgment and decision literature. , 1984 .

[38]  R. Cooke Experts in Uncertainty: Opinion and Subjective Probability in Science , 1991 .

[39]  Roger Cooke,et al.  Calibration and information in expert resolution; a classical approach , 1988, Autom..

[40]  B. Fischhoff,et al.  Calibration of probabilities: the state of the art to 1980 , 1982 .