Editorial

Although the use of large sample surveys of health-related issues is widespread, a typical medical statistician or clinical researcher frequently will be only dimly aware of the implications of the survey’s design for valid statistical inference. This can be a particular problem when the survey data are made publicly available for secondary analysis by researchers not linked to the original survey team. The aim of the present issue is to expose the nonspecialist to the ideas and methods of the specialist survey statistician, partly to indicate how data from a complex survey might be analysed in a valid way, and partly to stimulate the reader into thinking of ways in which the ideas from complex survey statistics might influence the routine analysis of medical data from sources other than large complex surveys. An example of the latter is the possible role of weighting to cope with missing values arising, for example, from refusal to provide sensitive information in an epidemiological study or withdrawal from a clinical trial. One does not need specialized software to do this (although, as stated in several of the papers here, one must be very wary of using most of the common packages they interpret the weights as counts of identical observations). The Huber procedures within Stata,l for example, allow for probability weighting and clustering. One can simply fit a logistic regression model, for example, with and without weighting and check whether the results are similar. Are the nonweighted results robust? If not, why not? Are there biases arising from ignoring the mechanism of generation of the missing values? Weighting will not provide a panacea, as the authors in this issue frequently point out, but its intelligent use might stimulate deeper thought about the message contained within the data. Another trick is to treat the subject as a grouping or clustering variable in a repeated measures analysis. This is equivalent to the use of a working independence model in the context of generalized estimating equations (GEE) methodology. This too can be implemented using the Huber procedures within Stata. Examples of the use of both weighting and working independence models in epidemiological surveys can be found in the review by Pickles et al. If, having read these excellent reviews, your appetite has been whetted, then I can strongly recommend the book by Lehtonen and Pahkinen,3 reviewed by Alan Taylor at the end of this issue. Graham Dunn

[1]  G. Dunn,et al.  Screening for stratification in two-phase ('two- stage') epidemiological surveys , 1995, Statistical methods in medical research.