Computing the Type Token Relation From the A Priori Distribution of Types

For homogeneous texts, the dependence of the vocabulary size y(x) on text length x is completely determined by the distribution D of the type probabilities. The function y(x) is derived from a simple difference equation. This solution is checked with artificial texts as well as with several German, English and Italian texts. For the natural texts, the distributions D are approximated by an interpolation equation: wj =const/jp. This expression is adjusted by a weighted least square fit for each text separately. The values obtained are in the range 0.36≤p≤0.81.