THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED

A sample of size N is drawn at random from a population of animals of various species. Methods are given for estimating, knowing only the contents of this sample, the number of species which will be represented r times in a second sample of size AN; these also enable us to estimate the number of different species and the proportion of the whole population represented in the second sample. A formula is found for the variance of the estimate; when A > 2, this variance becomes in general very large, so that the estimate is useless without some modification. This difficulty can be partly overcome, at least for A < 5, by using Euler's method with a suitable parameter or the methods described by Shanks (1955) to hasten the convergence of the series by which the estimate is expressed. The methods are applied to samples of words from Our Mutual Friend, to an entomological sample, and to a sample of nouns from Macaulay's essay on Bacon.