A Note on Zipf's Law, Natural Languages, and Noncoding DNA regions
暂无分享,去创建一个
In Phys. Rev. Letters (73:2), Mantegna et al. conclude on the basis of Zipf rank frequency data that noncoding DNA sequence regions are more like natural languages than coding regions. We argue on the contrary that an empirical fit to Zipf''s "law" cannot be used as a criterion for similarity to natural languages. Although DNA is a presumably "organized system of signs" in Mandelbrot''s (1961) sense, and observation of statistical featurs of the sort presented in the Mantegna et al. paper does not shed light on the similarity between DNA''s "gramar" and natural language grammars, just as the observation of exact Zipf-like behavior cannot distinguish between the underlying processes of tossing an M-sided die or a finite-state branching process.
[1] David B. Searls,et al. The computational linguistics of biological sequences , 1993, ISMB 1995.
[2] P. Pevzner,et al. Linguistics of nucleotide sequences. I: The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words. , 1989, Journal of biomolecular structure & dynamics.
[3] E. B. Newman,et al. Tests of a statistical explanation of the rank-frequency relation for words in written English. , 1958, American Journal of Psychology.