A Note on the Effects of Nonresponse on Surveys

B ERKSON [1] has recently illustrated by a hypothetical numerical example what might happen in a survey of the association between cigarette smoking and cancer of the lung if the rates of recruitment of the population to the survey vary according to the situation of the individual. It seems worthwhile to present his model in more generality and to explore some of its properties. Berkson proposes that the over-all population of size N accessible to the survey team (which, of course, may not be representative of the corresponding over-all group in the U. S.) is, unbeknownst to us, or at least unrecorded by the survey team, actually divided into two categories, those who are close to expiring and those who are not. Call these groups "unhealthy" and "healthy," and let the proportions of the population belonging to these groups be C and (1-C) with death rates D1 and D2. It is clear that even if correct in principle, in detail this model must be an oversimplification, since the concept of only two states of health with their associated probabilities C and (1-C) and death rates D1 and D2 is too crude. A model with some parameter of health, say h, continuous over some interval, with a probability distribution, and corresponding to each value of h a death rate D(h), presumably a monotonic function of h, would be more realistic. To consider such a model we would need to specify the distribution of h and the form of the function D(h). It seems probable that the main features of such a more complex model will emerge from our simple model. In this model, we assume that the probability of an individual being a smoker is S irrespective of whether he belongs to the unhealthy or healthy group. We now suppose that the recruitment rates of the population to the survey will be represented by the following symbols: