ON ESTIMATING THE RELATION BETWEEN BLOOD GROUP AND DISEASE

Following the demonstration of a significant excess of blood group A in patients with cancer of the stomach (Aird, Bentall & Roberts, 1953) and of group 0 in sufferers from peptic ulcer (Aird, Bentall, Mehigan & Roberts, 1954) and from toxaemia of pregnancy (Pike & Dickins, 1954) i t seems certain that many more studies will be made on the relation between blood groups and disease. It is therefore important that the best possible statistical methods should be used. The procedure recommended by Aird et al. (1954) is very efficient, but it is open to criticism on one rather important point. These workers take as criterion the difference in proportion of a given blood group in the disease and the control series. Denote the two blood types a and p. Suppose the disease series contains h patients of type a and k of type 8, where h + k = n, and the control series hw H of type a and K of type ,3, where H + K = N . Aird and associates calculate d = h/n H / N . This is tested for significance against its sampling variance, combined with estimates from other bodies of data to give a weighted mean estimate, and compared with these other estimates in tests for heterogeneity. Unfortunately, d will differ from one community to another even when the specific attack rate within any given blood group stays constant. This can be shown by a simple example. Consider a community of 10,000 people in which H and K are each 5000. Then if h= 100 and k = 50, d = 100/150 0.5, or 0.1667. Now consider another community in which His 9000 and K is 1000. In this case h= 180 and k10, so d = 180/190-0.9, or 0.0474. Even when the essential biological conditions are identical, differences in blood-group frequencies in the population will introduce spurious heterogeneity. This kind of artefact is avoided if one works with incidence rates in the various blood groups. The data usually do not permit calculation of absolute rates, nor are they needed. What is wanted and readily obtained is an estimate of the ratio of one rate to another. The incidence in group a will be h/H x some constant, and that in group /? will be k/K x the same constant. If the ratio is taken aa x to 1, an estimate of x will be hK/Hk, and it may readily be shown that this is the maximum-likelihood estimate. The use of x is recommended instead of d as a criterion of differential incidence of disease in relation to blood group. In all statistical computations it is best to transform x into its logarithm. This avoids difficulties due to asymmetry. If comparison of a with B gives x = 2 say, comparison of /3 with a will give x= 4; but log x will retain its numerical value, merely changing in sign. Moreover, the sampling variance of log x is a very simple expression free of ‘nuisance parameters’. This is especially true if one transforms into y=log, x. If V is the sampling variance of y, then v = l / h + l / k + 1/H+ 1/K,