THE ESTIMATION AND SIGNIFICANCE OF THE LOGARITHM OF A RATIO OF FREQUENCIES

It is occasionally useful to estimate the logarithm of the ratio of two classes in a sample. But a much more important case is discussed by Woolf (1955). Out of n cases of a disease, h had the character a and k the character p. Out of N unaffected persons, H had the character a and hthe character /3. For example, of 1490 cases of peptic ulcer in London investigated by Aird, Bentall, Mehigan & Roberts (1954), h=Y11, k=579, while among 8797 controls, H =4578, K = 4219, where the characters a and are membership of blood groups 0 and A . Hence Woolf calculated hK/kH = 1.4500, In (hK/kH) = 0.3716, where In x= logp. Suppose that the frequencies of groups 0 and A in all cases of peptic ulcer in London were p andq, among allother Londoners P and Q, then ln ( h K / k H ) is an efficient estimate of In (pQ/qP) . But it has a bias which is not always negligible. The estimate hK/kH has formally an infinite expectation, and all the moments of its distribution are infinite. For there is a finite, though very small, probability that either k or H should be zero. Even if such cases are neglected it has a bias of order n-1. As pointed out by Haldane & Maynard Smith (1056), t.he bias of h K / { ( k + 1) ( H + 1)) tends to zero more rapidly than any negative power of n or X . Similarly, all the moments of the distribution of the logarithm are formally infinite, since any of h, k , H and K may be zero with a finite probability. This fact by itself is unimportant, but i t is an indication that a less biased expression can be found. Since the two samples were taken independently, their sampling errors are uncorrelated, and we can therefore consider the estimation of In ( p l y ) . Let h = p n +a, k =qn -01. I am assuming sampling from an infinite population, and shall neglect the errors due to the fact that this is not so. Then