Computing the Type Token Relation From the A Priori Distribution of Types
暂无分享,去创建一个
For homogeneous texts, the dependence of the vocabulary size y(x) on text length x is completely determined by the distribution D of the type probabilities. The function y(x) is derived from a simple difference equation. This solution is checked with artificial texts as well as with several German, English and Italian texts. For the natural texts, the distributions D are approximated by an interpolation equation: wj =const/jp. This expression is adjusted by a weighted least square fit for each text separately. The values obtained are in the range 0.36≤p≤0.81.
[1] H. Simon,et al. ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .
[2] Juhan Tuldava,et al. Probleme und Methoden der quantitativ-systemischen Lexikologie , 1998 .
[3] Gustav Herdan,et al. The advanced theory of language as choice and chance , 1968 .
[4] Benoit B. Mandelbrot,et al. Structure Formelle des Textes et Communication , 1954 .
[5] Ján Horecký,et al. Otázky lexikální statistiky , 1974 .