Design of cross‐sectional surveys using cluster sampling: an overview with Australian case studies

ross-sectional surveys are a mainstay ofresearch in epidemiology and C public health, despite the fact that their relatively simple design allows little scope to investigate causal relationships. Their appeal lies in their ability to provide descriptive information about the health of populations (prevalence estimates) and to estimate associations between health states and demographic and other background factors. Although such cross-sectional associations need to be interpreted cautiously, they often provide useful insights into the aetiology of diseases and health-risky behaviours. Examples of cross-sectional sample surveys published recently in Australia include the large-scale National Health Survey1 conducted by the Australian Bureau of Statistics and smaller-scale studies on vaccination coverage in an Australian city,2 transport patterns among young school children? and health and health-related behaviour of adolescents.4vs Given that surveys aim to provide valid estimates of population parameters, random selection of participants is a key principle (unlike, for instance, in randomised controlled trials, where it is randomised allocation between groups that is the key to valid comparisons). Surveys based on simple random samples, where participants are randomly selected from a complete list of all eligible subjects, are easy to conceptualise and analyse, but often very difficult to carry out. Instead, many survey samples are selected using cluster sampling methods, where members of the population are grouped in clusters and sampling takes place in two (or more) stages: first a sample of clusters is selected and then a sample of individuals within clusters. A classic example of cluster sampling is provided by the WHO Expanded Programme on Immunisation (EPI), for which a modified cluster sampling methodology6-* was developed to estimate immunisation coverage in developing countries. This survey entails the random selection of 30'clusters (usually geographical areas) followed by quota (nonrandom) sampling of seven individuals from each cluster. When data for creating sampling frames (lists of eligible subjects) are absent or of dubious quality, as is often the case in developing countries, the EPI technique is a feasible and cost-effective means of collecting information. However, since it is problematic to apply the methods of statistical inference used for surveys to non-random samples, the EPI method precludes the calculation of standard errors and confidence intervals for survey e~t imates .~ More rigorous cluster-based probability sampling methods are, however, used in many cross-sectional health surveys where, for feasibility reasons, it is necessary to base sampling on 'natural' clusters, such as schools, day-care centres, general practices or community health ~entres .*J ,~ The related methodology of cluster-based or group randomised trials has received considerable attention,"'-14 including a recent textbook.I5 Cluster randomisation has been widely employed in the evaluation of community health interventions, such as the Minnesota Heart Health Program,I6 and many trials take advantage of other clusters such as school classes, general practices, work places and households. " I 9 In contrast with cluster-based trials, there has been less written in the public health literature on the use of cluster sampling in crosssectional surveys. This paper reviews issues of sample design, statistical analysis and sample size requirements, with the aim of providing a simple and accessible explanation of cluster sampling and related technical issues. The concepts of design effect, intra-cluster correlation (ICC) and sampling weights are discussed. A child transportation study) and an adolescent health survefl-5 are re-analysed, first, to illustrate the statistical issues associated with cluster surveys and, second, to estimate design effects and ICCs associated with a number of outcomes. Our publication of estimated ICCs based on Australian data should assist other researchers in planning similar surveys. These estimates will also be useful in the planning of cluster randomised trials, for which the same information on intra-cluster correlation is needed.

