Punctuation effects in English and Esperanto texts

A statistical physics study of punctuation effects on sentence lengths is presented for written texts: Alice in wonderland and Through a looking glass. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log–log plots of the sentence-length–rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity (ca. 0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.

[1]  Jiabin Wang,et al.  An analysis of Zipf-Mandelbrot language measures and their application to artificial languages , 1993, J. Inf. Sci..

[2]  Eugene H. Ehrlich Schaum's outline of theory and problems of punctuation, capitalization, and spelling , 1978 .

[3]  M. Kendall The Statistical Study of Literary Vocabulary , 1944, Nature.

[4]  Andrew Wilson,et al.  Word-length distribution in modern Welsh prose texts , 2003, Glottometrics.

[5]  Gökhan Dalkiliç,et al.  Zipf's Law and Mandelbrot's Constants for Turkish Language Using Turkish Corpus (TurCo) , 2004, ADVIS.

[6]  Baruch Vilensky,et al.  Can analysis of word frequency distinguish between writings of different authors , 1996 .

[7]  Jaroslaw Kwapien,et al.  Approaching the linguistic complexity , 2009, Complex.

[8]  C. Habel,et al.  Language , 1931, NeuroImage.

[9]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10]  Tetjana Dzurjuk Sentence length as a feature of style (applied to works of German writers) , 2006, Glottometrics.

[11]  Francis Jack Smith,et al.  Extension of Zipf’s Law to Words and Phrases , 2002, COLING.

[12]  G. Udny Yule,et al.  The statistical study of literary vocabulary , 1944 .

[13]  Peter Meyer,et al.  Laws and theories in quantitative linguistics , 2002, Glottometrics.

[14]  A. Michalos,et al.  Readings in Mathematical Social Science , 1968 .

[15]  Wang Dahui,et al.  True reason for Zipf's law in language , 2005 .

[17]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[18]  Reinhard Köhler Power law models in linguistics: Hungarian , 2002, Glottometrics.

[19]  Karl Bühler,et al.  Theory of Language: The Representational Function of Language , 2011 .

[20]  Motohiro Ishida,et al.  On distributions of sentence lengths in Japanese writing , 2007, Glottometrics.

[21]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[22]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[23]  F. J. Smith,et al.  Zipf and Type-Token rules for the English and Irish languages , 2004 .

[24]  Marcel Ausloos,et al.  Equilibrium (Zipf) and Dynamic (Grasseberg-Procaccia) method based analyses of human texts. A comparison of natural (english) and artificial (esperanto) languages , 2008, ArXiv.

[25]  Marcelo A. Montemurro,et al.  Beyond the Zipf-Mandelbrot law in quantitative linguistics , 2001, ArXiv.

[26]  Otto A. Rottmann Word length in the Baltic languages - are they of the same type as the word lengths in the Slavic languages? , 2003, Glottometrics.

[27]  George Carayannis,et al.  Basic Quantitative Characteristics of the Modern Greek Language Using the Hellenic National Corpus , 2005, J. Quant. Linguistics.

[28]  Universality of Zipf's Law , 2002, cond-mat/0203455.

[29]  Ronald Rousseau,et al.  Zipf's data on the frequency of Chinese words revisited , 1992, Scientometrics.

[30]  Werner Ebeling,et al.  Long-range correlations between letters and sentences in texts , 1995 .

[31]  George Carayannis,et al.  Word Length, Word Frequencies and Zipf’s Law in the Greek Language , 2001, J. Quant. Linguistics.

[32]  Caroline M. Eastman,et al.  Comparative lexical analysis of FORTRAN code, code comments and English text , 1980, ACM-SE 18.

[33]  Mike Thelwall,et al.  Word statistics in Blogs and RSS feeds: Towards empirical universal evidence , 2007, J. Informetrics.

[34]  L. Carroll,et al.  Alice's Adventures in Wonderland: Princeton University Press , 2015 .

[35]  M. Neubert The lure of modern science: Fractal thinking , 1997 .

[36]  Alexander F. Gelbukh,et al.  Zipf and Heaps Laws' Coefficients Depend on Language , 2001, CICLing.

[37]  Anja Kaßel,et al.  Untersuchungen zur Satzlängenhäufigkeit im Englischen: Am Beispiel von Texten aus Presse und Literatur (Belletristik) , 2001, Glottometrics.

[38]  Alkiviadis Kalampokis,et al.  Language time series analysis , 2006, physics/0607095.

[39]  Lahomtoires d'Electronique AN INFORMATIONAL THEORY OF THE STATISTICAL STRUCTURE OF LANGUAGE 36 , 2010 .

[40]  Models of Universal Power-Law Distributions , 2003, cond-mat/0303331.

[41]  X. Gabaix Zipf's Law for Cities: An Explanation , 1999 .

[42]  Michael Don Palmer,et al.  Reflections on language , 1977 .

[43]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[44]  Ido Kanter,et al.  Identifying universals of text translation* , 2006, J. Quant. Linguistics.