Ordinal analysis of lexical patterns.

Words are fundamental linguistic units that connect thoughts and things through meaning. However, words do not appear independently in a text sequence. The existence of syntactic rules induces correlations among neighboring words. Using an ordinal pattern approach, we present an analysis of lexical statistical connections for 11 major languages. We find that the diverse manners that languages utilize to express word relations give rise to unique pattern structural distributions. Furthermore, fluctuations of these pattern distributions for a given language can allow us to determine both the historical period when the text was written and its author. Taken together, our results emphasize the relevance of ordinal time series analysis in linguistic typology, historical linguistics, and stylometry.

[1]  H. V. Ribeiro,et al.  Permutation Jensen-Shannon distance: A versatile and fast symbolic tool for complex time-series analysis. , 2022, Physical review. E.

[2]  M. Zanin,et al.  Ordinal patterns-based methodologies for distinguishing chaos from noise in discrete time series , 2021, Communications Physics.

[3]  Matjaž Perc,et al.  History of art paintings through the lens of entropy and complexity , 2018, Proceedings of the National Academy of Sciences.

[4]  A. Mehri,et al.  Variation of Zipf's exponent in one hundred live languages: A study of the Holy Bible translations , 2017 .

[5]  Armin Bunde,et al.  Long-Range Memory in Literary Texts: On the Universal Clustering of the Rare Words , 2016, PloS one.

[6]  Mark Steedman,et al.  A massively parallel corpus: the Bible in 100 languages , 2014, Lang. Resour. Evaluation.

[7]  Gemma Boleda,et al.  Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts , 2014, PloS one.

[8]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[9]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[10]  Eduardo G. Altmann,et al.  On the origin of long-range correlations in texts , 2012, Proceedings of the National Academy of Sciences.

[11]  M. Montemurro,et al.  Universal Entropy of Word Ordering Across Linguistic Families , 2011, PloS one.

[12]  Stuart James,et al.  The Cambridge Encyclopedia of Language (3rd ed.) , 2011 .

[13]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[14]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[15]  Philip Hanna,et al.  Extending Zipf’s law to n-grams for large corpora , 2009, Artificial Intelligence Review.

[16]  O. Rosso,et al.  Shakespeare and other English Renaissance authors as characterized by Information Theory complexity quantifiers , 2009 .

[17]  Jack Grieve,et al.  Quantitative Authorship Attribution: An Evaluation of Techniques , 2007, Lit. Linguistic Comput..

[18]  J-P Eckmann,et al.  Hierarchical structures induce long-range dynamical correlations in written texts. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.

[21]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[22]  Marcelo A. Montemurro,et al.  Long-range fractal correlations in literary corpora , 2002, ArXiv.

[23]  W. Ebeling,et al.  Entropy and Long-Range Correlations in Literary English , 1993, chao-dyn/9309005.

[24]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[25]  W. Greg The Concise Cambridge History of English Literature , 1943 .

[26]  C. Lacor,et al.  Chaos , 1876, Molecular Vibrations.

[27]  David Crystal,et al.  The Cambridge Encyclopedia of Language , 2012, Modern Language Review.

[28]  W. Ditto,et al.  Chaos: From Theory to Applications , 1992 .

[29]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[30]  G. Āllport The Psycho-Biology of Language. , 1936 .

[31]  G. Sampson The Concise Cambridge History of English Literature , 2022 .