Equilibrium (Zipf) and Dynamic (Grasseberg-Procaccia) method based analyses of human texts. A comparison of natural (english) and artificial (esperanto) languages

A comparison of two English texts written by Lewis Carroll, one (Alice in Wonderland), also translated into Esperanto, the other (Through the Looking Glass) are discussed in order to observe whether natural and artificial languages significantly differ from each other. One dimensional time series like signals are constructed using only word frequencies (FTS) or word lengths (LTS). The data is studied through (i) a Zipf method for sorting out correlations in the FTS and (ii) a Grassberger–Procaccia (GP) technique based method for finding correlations in LTS. The methods correspond to an equilibrium and a dynamic approach respectively to human texts features. There are quantitative statistical differences between the original English text and its Esperanto translation, but the qualitative differences are very minutes. However different power laws are observed with characteristic exponents for the ranking properties, and the phase space attractor dimensionality. The Zipf exponent can take values much less than unity (∼0.50 or 0.30) depending on how a sentence is defined. This variety in exponents can be conjectured to be an intrinsic measure of the book style or purpose, rather than the language or author vocabulary richness, since a similar exponent is obtained whatever the text. Moreover the attractor dimension r is a simple function of the so called phase space dimension n, i.e., r=nλ, with λ=0.79. Such an exponent could also be conjectured to be a measure of the author style versatility, — here well preserved in the translation.

[1]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[2]  Bikas K. Chakrabarti,et al.  Econophysics and Sociophysics : Trends and Perspectives , 2006 .

[3]  M A Nowak,et al.  The evolution of language. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  A. Michalos,et al.  Readings in Mathematical Social Science , 1968 .

[5]  Funabashi,et al.  Scale-free statistics of time interval between successive earthquakes , 2004, cond-mat/0410123.

[6]  Eric Métois,et al.  Musical sound information : musical gestures and embedding synthesis , 1997 .

[7]  D. Vernon Inform , 1995, Encyclopedia of the UN Sustainable Development Goals.

[8]  Ronald Rousseau,et al.  Zipf's data on the frequency of Chinese words revisited , 1992, Scientometrics.

[9]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[10]  L. Carroll,et al.  Alice's Adventures in Wonderland: Princeton University Press , 2015 .

[11]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[12]  Marcel Ausloos Equilibrium (Zipf) and Dynamic (Grasseberg-Procaccia) method based analyses of human texts. A comparison of natural (english) and artificial (esperanto) languages , 2008, ArXiv.

[13]  Models of Universal Power-Law Distributions , 2003, cond-mat/0303331.

[14]  Eric J. Kostelich,et al.  Practical considerations in estimating dimension from time series data , 1989 .

[15]  Jiabin Wang,et al.  An analysis of Zipf-Mandelbrot language measures and their application to artificial languages , 1993, J. Inf. Sci..

[16]  Daniel J. Fenn,et al.  How does Europe Make Its Mind Up? Connections, cliques, and compatibility between countries in the Eurovision Song Contest , 2005, physics/0505071.

[17]  G. Āllport The Psycho-Biology of Language. , 1936 .

[18]  Marcelo A. Montemurro,et al.  Beyond the Zipf-Mandelbrot law in quantitative linguistics , 2001, ArXiv.

[19]  Marcelo A. Montemurro,et al.  Dynamics of Text Generation with Realistic Zipf's Distribution , 2002, J. Quant. Linguistics.

[20]  G. Emch Non-Equilibrium Quantum Statistical Mechanics , 1976 .

[21]  Alexander F. Gelbukh,et al.  Zipf and Heaps Laws' Coefficients Depend on Language , 2001, CICLing.

[22]  Ronald Rousseau,et al.  A weak goodness-of-fit test for rank-frequency distributions , 1999 .

[23]  The n-Zipf analysis of financial data series and biased data series , 1999 .

[24]  Lahomtoires d'Electronique AN INFORMATIONAL THEORY OF THE STATISTICAL STRUCTURE OF LANGUAGE 36 , 2010 .

[25]  H E Stanley,et al.  Linguistic features of noncoding DNA sequences. , 1994, Physical review letters.

[26]  M. Neubert The lure of modern science: Fractal thinking , 1997 .

[27]  James Theiler,et al.  Estimating fractal dimension , 1990 .

[28]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[29]  Alkiviadis Kalampokis,et al.  Language time series analysis , 2006, physics/0607095.

[30]  X. Gabaix Zipf's Law for Cities: An Explanation , 1999 .

[31]  Jun Zhang,et al.  LONG RANGE CORRELATION IN HUMAN WRITINGS , 1993 .

[32]  Michael Don Palmer,et al.  Reflections on language , 1977 .

[33]  Mike Thelwall,et al.  Word statistics in Blogs and RSS feeds: Towards empirical universal evidence , 2007, J. Informetrics.

[34]  Lucas Antiqueira,et al.  COMPLEX NETWORKS ANALYSIS OF MANUAL AND MACHINE TRANSLATIONS , 2008 .

[35]  Thornbjorn Knudsen,et al.  Zipf's Law for Cities and Beyond: The Case of Denmark , 2001 .

[36]  George Carayannis,et al.  Basic Quantitative Characteristics of the Modern Greek Language Using the Hellenic National Corpus , 2005, J. Quant. Linguistics.

[37]  Kanter,et al.  Markov processes: Linguistics and Zipf's law. , 1995, Physical review letters.

[38]  A. N. Anagnostopoulos,et al.  Crisis in electrical behavior of the TlInSe 2 semiconducting compound , 1996 .

[39]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[40]  Luc Steels A self-organizing spatial vocabulary , 1995 .

[41]  D. Stauffer,et al.  Birth, survival and death of languages by Monte Carlo simulation , 2007 .

[42]  P. Grassberger,et al.  Characterization of Strange Attractors , 1983 .

[43]  Caroline M. Eastman,et al.  Comparative lexical analysis of FORTRAN code, code comments and English text , 1980, ACM-SE 18.

[44]  Greece,et al.  Language evolution and population dynamics in a system of two interacting species , 2005, cond-mat/0502118.

[45]  Marcelo A. Montemurro,et al.  Long-range fractal correlations in literary corpora , 2002, ArXiv.

[46]  Vittorio Loreto,et al.  Topology Induced Coarsening in Language Games , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[48]  M. Ausloos,et al.  Precise (m,k)-Zipf diagram analysis of mathematical and financial time series when m=6, k=2 , 1999 .

[49]  Ioannis M. Kyprianidis,et al.  Chaotic behaviour of a fourth-order autonomous electric circuit , 2003 .

[50]  Ido Kanter,et al.  Identifying universals of text translation* , 2006, J. Quant. Linguistics.

[51]  Christian Schulze,et al.  Sociophysics simulations I: language competition , 2005 .

[52]  Ing Ren Tsang,et al.  Theoretical model for the evolution of the linguistic diversity , 2005, physics/0505197.

[53]  Dietrich Stauffer,et al.  MONTE CARLO SIMULATION OF THE RISE AND THE FALL OF LANGUAGES , 2005 .

[54]  George Carayannis,et al.  Word Length, Word Frequencies and Zipf’s Law in the Greek Language , 2001, J. Quant. Linguistics.

[55]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[56]  A. Giuliani,et al.  Recurrence Quantification Analysis and Principal Components in the Detection of Short Complex Signals , 1997, chao-dyn/9712017.

[57]  Christian Schulze,et al.  Computer Simulation of Language Competition by Physicists , 2006 .

[58]  Linguist , 2006, Nicolaas van Wijk (1880-1941).

[59]  Benoit B. Mandelbrot,et al.  Simpie games of strategy occurring in communication through natural languages , 1954, Trans. IRE Prof. Group Inf. Theory.

[60]  Floris Takens,et al.  On the numerical determination of the dimension of an attractor , 1985 .

[61]  Luc Steels,et al.  The synthetic modeling of language origins , 1997 .

[62]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[63]  Dietrich Stauffer,et al.  Microscopic and macroscopic simulation of competition between languages , 2005 .

[64]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[65]  Hokky Situngkir,et al.  What Can We See from Investment Simulation Based on Generalized (M,2)-Zipf Law? , 2005 .

[66]  M. Ausloos,et al.  Strategy for investments from Zipf law(s) , 2002, cond-mat/0210499.

[67]  Bill Z. Manaris,et al.  Investigating Esperanto's Statistical Proportions Relative to other Languages using Neural Networks and Zipf's Law , 2006, Artificial Intelligence and Applications.

[68]  M. Kendall The Statistical Study of Literary Vocabulary , 1944, Nature.

[69]  G. Udny Yule,et al.  The statistical study of literary vocabulary , 1944 .

[70]  R. Ferrer i Cancho,et al.  The variation of Zipf's law in human language , 2005 .

[71]  Baruch Vilensky,et al.  Can analysis of word frequency distinguish between writings of different authors , 1996 .

[72]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[73]  Werner Ebeling,et al.  Long-range correlations between letters and sentences in texts , 1995 .

[74]  Generalized (m, k)-Zipf Law for Fractional Brownian Motion-Like Time Series with or Without Effect of an Additional Linear Trend , 2003, cond-mat/0209306.

[75]  Nadav M. Shnerb,et al.  LANGUAGE AND CODIFICATION DEPENDENCE OF LONG-RANGE CORRELATIONS IN TEXTS , 1994 .

[76]  Harvard Medical School,et al.  Effect of nonstationarities on detrended fluctuation analysis. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[77]  I. Prigogine Exploring Complexity , 2017 .

[78]  Universality of Zipf's Law , 2002, cond-mat/0203455.

[79]  Ricard V. Solé,et al.  Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited* , 2001, J. Quant. Linguistics.