It is the purpose of this paper to analyse a class of distribution functions that appears in a wide range of empirical data-particularly data describing sociological, biological and economic phenomena. Its appearance is so frequent, and the phenomena in which it appears so diverse, that one is led to the conjecture that if these phenomena have any property in common it can only be a similarity in the structure of the underlying probability mechanisms. The empirical distributions to which we shall refer specifically are: (A) distributions of words in prose samples by their frequency of occurrence, (B) distributions of scientists by number of papers published, (C) distributions of cities by population, (D) distributions of incomes by size, and (E) distributions of biological genera by number of species. No one supposes that there is any connexion between horse-kicks suffered by soldiers in the German army and blood cells on a microscope slide other than that the same urn scheme provides a satisfactory abstract model of both phenomena. It is in the same direction that we shall look for an explanation of the observed close similarities among the five classes of distributions listed above. The observed distributions have the following characteristics in common: (a) They are J-shaped, or at least highly skewed, with very long upper tails. The tails can generally be approximated closely by a function of the form
[1]
W. Burnside.
Theory of Functions
,
1899,
Nature.
[2]
G. Yule,et al.
A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S.
,
1925
.
[3]
G. Yule,et al.
A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S.
,
1925
.
[4]
On the number of words of any given frequency of use
,
1937
.
[5]
Harold T. Davis,et al.
The Analysis of Economic Time Series.
,
1942
.
[6]
A. Bowley.
The Analysis of Economic Time Series
,
1942,
Nature.
[7]
M. Kendall.
The Statistical Study of Literary Vocabulary
,
1944,
Nature.
[8]
D. Kendall,et al.
On some modes of population growth leading to R. A. Fisher's logarithmic series distribution.
,
1948,
Biometrika.
[9]
George Kingsley Zipf,et al.
Human behavior and the principle of least effort
,
1949
.
[10]
William Feller,et al.
An Introduction to Probability Theory and Its Applications
,
1951
.
[11]
M. Joos,et al.
Word index to James Joyce's Ulysses
,
1951
.
[12]
J. H. Darwin.
POPULATION DIFFERENCES BETWEEN SPECIES GROWING ACCORDING TO SIMPLE BIRTH AND DEATH PROCESSES
,
1953
.
[13]
D. Champernowne.
A Model of Income Distribution
,
1953
.
[14]
I. Good.
THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS
,
1953
.