Random Language Model: a path to principled complexity

Many complex generative systems use languages to create structured objects. We consider a model of random languages, defined by weighted context-free grammars. As the distribution of grammar weights broadens, a transition is found from a random phase, in which sentences are indistinguishable from noise, to an organized phase in which nontrivial information is carried. This marks the emergence of deep structure in the language, and can be understood by a competition between energy and entropy.

[1]  D. Sornette,et al.  Convergent Multiplicative Processes Repelled from Zero: Power Laws and Truncated Power Laws , 1996, cond-mat/9609074.

[2]  Peter Svenonius,et al.  Deriving the Functional Hierarchy , 2014 .

[3]  Max Tegmark,et al.  Critical Behavior in Physics and Probabilistic Formal Languages , 2016, Entropy.

[4]  Mark C. Baker The Atoms of Language , 1987 .

[5]  Wilhelm Freiherr von Humboldt,et al.  On Language: On the Diversity of Human Language Construction and Its Influence on the Mental Development of the Human Species , 2001 .

[6]  M. W. Shields An Introduction to Automata Theory , 1988 .

[7]  P. Flajolet,et al.  Analytic Combinatorics: RANDOM STRUCTURES , 2009 .

[8]  P. Niyogi,et al.  Computational and evolutionary aspects of language , 2002, Nature.

[9]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[10]  Noam Chomsky,et al.  The growth of language: Universal Grammar, experience, and principles of computation , 2017, Neuroscience & Biobehavioral Reviews.

[11]  Emil L. Post Formal Reductions of the General Combinatorial Decision Problem , 1943 .

[12]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[13]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[14]  Giorgio Parisi Complex Systems: a Physicist's Viewpoint , 1999 .

[15]  Ur Shlonsky,et al.  The Cartographic Enterprise in Syntax , 2010, Lang. Linguistics Compass.

[16]  Richard J. Lipton,et al.  DNA Based Computers , 1996 .

[17]  Noam Chomsky,et al.  Poverty of the Stimulus Revisited , 2011, Cogn. Sci..

[18]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[19]  Gemma Boleda,et al.  Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts , 2014, PloS one.

[20]  Kanter,et al.  Mean-field theory of the Potts glass. , 1985, Physical review letters.

[21]  W. Marsden I and J , 2012 .

[22]  Simona Cocco,et al.  On the Entropy of Protein Families , 2015, Journal of Statistical Physics.

[23]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[24]  Pierre Ganty,et al.  Parikhʼs theorem: A simple and direct automaton construction , 2010, Inf. Process. Lett..

[25]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[26]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[27]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[28]  W. Ebeling,et al.  Entropy and Long-Range Correlations in Literary English , 1993, cond-mat/0204108.

[29]  Max Tegmark,et al.  Criticality in Formal Languages and Statistical Physics , 2016 .

[30]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[32]  A. A. Luce,et al.  On the Philosophy , 1973 .

[33]  Eric B. Baum,et al.  DNA Based Computers II , 1998 .