Numerical Analysis of Word Frequencies in Artificial and Natural Language Texts
暂无分享,去创建一个
We perform a numerical study of the statistical properties of natural texts written in English and of two types of artificial texts. As statistical tools we use the conventional Zipf analysis of the distribution of words and the inverse Zipf analysis of the distribution of frequencies of words, the analysis of vocabulary growth, the Shannon entropy and a quantity which is a nonlinear function of frequencies of words, the frequency "entropy". Our numerical results, obtained by investigation of eight complete books and sixteen related artificial texts, suggest that, among these analyses, the analysis of vocabulary growth shows the most striking difference between natural and artificial texts. Our results also suggest that, among these analyses, those who give a greater weight to low frequency words succeed better in distinguishing between natural and artificial texts. The inverse Zipf analysis seems to succeed better than the conventional Zipf analysis and the frequency "entropy" better than the usual word ...