SUMMARY A test is given to detect clustering in disease incidence or mortality data. The test statistic is the mean distance between all pairs of disease cases. Its null mean and variance and its asymptotic normality are derived under assumptions that accommodate differences in population distribution among demographic subgroups at different disease risk. The test is illustrated on 63 cases of anal and rectal squamous cell carcinoma in San Francisco during 1973-1981. Patterns of disease incidence and mortality over time, space, or occupational categories can provide clues to the cause of the disease. An aetiologic agent can produce a spatial, temporal or occupational cluster of disease cases. It is seldom difficult to detect clusters of rare diseases like angiosarcoma of the liver among men occupationally exposed to polyvinyl chloride. However more common diseases present two problems. First, clusters may be obscured by the scattered occurrence of cases unrelated to the cause of the clusters. Secondly, clusters may be produced by factors unrelated to the disease process, such as variations in the overall population distribution, or variations in the distributions of demographic subgroups at high disease risk. It is therefore useful to have a method for detecting clusters that will adjust for such factors. Previous tests for clustering, for example, Pinkel & Nefzger (1959), Knox (1964), Mantel (1967), are not satisfactory for the study of chronic disease such as cancer. Those tests are designed to determine whether cases are clustered both in space and in time simultaneously. However, cases of chronic disease caused by a spatially localized agent may be close in space, but they are unlikely to be close in time, because of long and variable time periods between exposure and diagnosis. Thus there is need for alternative methods to detect spatial clusters in regional disease incidence or mortality data. Consider the set X = {xl,... , XK } of points called census tracts, with Nk cases of disease, called cancer, in tract Xk. Under the hypothesis Ho of no clustering, the Nk are independent Poisson variables, with the mean of Nk proportional to the population size ek in tract xk:
[1]
W. Hoeffding.
A Class of Statistics with Asymptotically Normal Distribution
,
1948
.
[2]
D. Pinkel,et al.
Some epidemiological features of childhood leukemia in the Buffalo, N.Y., area
,
1959,
Cancer.
[3]
Nathan Mantel,et al.
A Statistical Problem in Space and Time: Do Leukemia Cases Come in Clusters?
,
1964
.
[4]
G. Knox.
Epidemiology of Childhood Leukaemia in Northumberland and Durham
,
1964,
British journal of preventive & social medicine.
[5]
Samuel Karlin,et al.
A First Course on Stochastic Processes
,
1968
.
[6]
N. Mantel.
The detection of disease clustering and a generalized regression approach.
,
1967,
Cancer research.
[7]
S. R. Searle.
Linear Models
,
1971
.
[8]
P. Holland,et al.
Discrete Multivariate Analysis.
,
1976
.
[9]
T Tango,et al.
The detection of disease clustering in time.
,
1984,
Biometrics.