Quantifying Expert Opinion in Linear Regression Problems

SUMMARY This paper describes a method for choosing a natural conjugate prior distribution for a normal linear sampling model. A person using the method to quantify his/her opinions performs specified elicitation tasks. The hyperparameters of the conjugate distribution are estimated from the elicited values. The method is designed to require elicitation tasks that people can perform competently and introduces a type of task not previously reported. A property of the method is that the assessed variance matrices are certain to be positive definite. The method is sufficiently simple to implement with an interactive computer program on a microcomputer. Bayesian statistical methods provide a formal mechanism for taking into account prior knowledge, meaning information available 'prior' to the statistical data to be analysed. In many circumstances, prior knowledge is based on historical data that are only recorded in the form of the personal experience of experts. To use the information, the expert must quantify his/her opinions, by answering comprehensible questions concerning unknown but definite quantities of direct interest. His set of answers should enable a probability distribution to be determined. This distribution must satisfy the usual laws of probability so that, for example, any variance-covariance matrix must be positive definite. In addition, the distribution should be 'accurate' in some sense. For example, the distribution might be required to describe the expert's knowledge closely or, perhaps, to predict subsequent events with relative success. The accuracy of the assessed distribution will depend, in part, on the method of elicitation. In this paper we present a method of assessing a subjective prior distribution for a normal linear sampling model. This is an important task, owing to the wide applicability of such models and the many situations where expert personal opinion could be used more efficiently, communicated more accurately and judged more critically if it were available in a suitable form. We denote a particular setting of the independent variables by x and the dependent variable by y, referring to the former as a design point and the latter as the response. The model specifies that the 'objective' or 'sampling' distribution of y, conditional on x and the sampling model parameters