论文信息 - Computing the Type Token Relation From the A Priori Distribution of Types

Computing the Type Token Relation From the A Priori Distribution of Types

For homogeneous texts, the dependence of the vocabulary size y(x) on text length x is completely determined by the distribution D of the type probabilities. The function y(x) is derived from a simple difference equation. This solution is checked with artificial texts as well as with several German, English and Italian texts. For the natural texts, the distributions D are approximated by an interpolation equation: wj =const/jp. This expression is adjusted by a weighted least square fit for each text separately. The values obtained are in the range 0.36≤p≤0.81.

Dieter Müller | Dieter Müller

[1] H. Simon,et al. ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[2] Juhan Tuldava,et al. Probleme und Methoden der quantitativ-systemischen Lexikologie , 1998 .

[3] Gustav Herdan,et al. The advanced theory of language as choice and chance , 1968 .

[4] Benoit B. Mandelbrot,et al. Structure Formelle des Textes et Communication , 1954 .

[5] Ján Horecký,et al. Otázky lexikální statistiky , 1974 .