Probability Distributions of Language Entities*

Continuing Best (1998), this paper presents new investigations of the Göttingen Project on Quantitative Linguistics which aims at the examination of the laws controlling the frequency distributions of different kinds of linguistic units in texts and lexica. The main topic was the distributions of word lengths in texts; up to now, more than 40 languages have been investigated with promising results. In the mean time, some word length distributions in lexica are considered as well as the distributions of many other entities in texts. New results concerning the distributions of parts of speech suggest a more general validity of the law, which in the very beginning was intended for word length distributions only. For the time being, there exist very few test results which do not support it. The law of probability distributions concerning classes of entities can be seen as a kind of ‘horizontal’ language structuring beside others like the distributions of single entities (graphemes, phonemes, word forms, etc.), which follow several empirical distributions (Zipf-Mandelbrot, Geometric and Hypergeometric Distributions), and a ‘vertical’ one by the Menzerath-Altmann law. Together with the Köhlerian circle, a multiple structuring of language and texts has to be conceived of.