Language Design as Information Renormalization

Here we consider some well-known facts in syntax from a physics perspective, allowing us to establish equivalences between both fields with many consequences. Mainly, we observe that the operation MERGE, put forward by N. Chomsky in 1995, can be interpreted as a physical information coarse-graining. Thus, MERGE in linguistics entails information renormalization in physics, according to different time scales. We make this point mathematically formal in terms of language models. In this setting, MERGE amounts to a probability tensor implementing a coarse-graining, akin to a probabilistic context-free grammar. The probability vectors of meaningful sentences are given by stochastic tensor networks (TN) built from diagonal tensors and which are mostly loop-free, such as Tree Tensor Networks and Matrix Product States, thus being computationally very efficient to manipulate. We show that this implies the polynomially-decaying (long-range) correlations experimentally observed in language, and also provides arguments in favour of certain types of neural networks for language processing. Moreover, we show how to obtain such language models from quantum states that can be efficiently prepared on a quantum computer, and use this to find bounds on the perplexity of the probability distribution of words in a sentence. Implications of our results are discussed across several ambits.

[1]  Noah A. Smith,et al.  Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[2]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[3]  Jean-Christophe Nebel,et al.  A stochastic context free grammar based framework for analysis of protein sequences , 2009, BMC Bioinformatics.

[4]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[5]  Noam Chomsky,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[6]  Noam Chomsky,et al.  Bare Phrase Structure , 1994 .

[7]  F. Verstraete,et al.  Matrix product states, projected entangled pair states, and variational renormalization group methods for quantum spin systems , 2008, 0907.2796.

[8]  K. Wilson Problems in Physics with many Scales of Length , 1979 .

[9]  D. Searls,et al.  A primer in macromolecular linguistics , 2013, Biopolymers.

[10]  Andreas Schadschneider,et al.  Equivalence and solution of anisotropic spin-1 models and generalized t-J fermion models in one dimension , 1991 .

[11]  Román Orús,et al.  Advances on tensor network theory: symmetries, fermions, entanglement, and holography , 2014, 1407.6552.

[12]  Roman Orus,et al.  A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States , 2013, 1306.2164.

[13]  J. Ignacio Cirac,et al.  Purifications of multipartite states: limitations and constructive methods , 2013, 1308.1914.

[14]  R. Plomin,et al.  Pleiotropy across academic subjects at the end of compulsory education , 2015, Scientific Reports.

[15]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[16]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[17]  Efstathios Stamatatos,et al.  Syntactic Dependency-Based N-grams as Classification Features , 2012, MICAI.

[18]  I. Peretz Music, Language and Modularity Framed in Action , 2009 .

[19]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[20]  Eduardo G. Altmann,et al.  On the origin of long-range correlations in texts , 2012, Proceedings of the National Academy of Sciences.

[21]  Noam Chomsky,et al.  The language capacity: architecture and evolution , 2016, Psychonomic bulletin & review.

[22]  Denise Brandão de Oliveira e Britto,et al.  The faculty of language , 2007 .

[23]  Adv , 2019, International Journal of Pediatrics and Adolescent Medicine.

[24]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[25]  Katherine J. Alcock,et al.  Pitch and Timing Abilities in Inherited Speech and Language Impairment , 2000, Brain and Language.

[26]  Max Tegmark,et al.  Criticality in Formal Languages and Statistical Physics∗ , 2017 .

[27]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002 .

[28]  Ben Calderhead,et al.  Advances in Neural Information Processing Systems 29 , 2016 .

[29]  Ángel J. Gallego,et al.  Generative Grammar and the Faculty of Language : Insights, Questions, and Challenges , 2019 .

[30]  Noam Chomsky,et al.  How Could Language Have Evolved? , 2014, PLoS biology.

[31]  Stanislas Dehaene,et al.  Neurophysiological dynamics of phrase-structure building during sentence processing , 2017, Proceedings of the National Academy of Sciences.

[32]  A. Monaco,et al.  A forkhead-domain gene is mutated in a severe speech and language disorder , 2001, Nature.

[33]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[34]  D'arcy W. Thompson On Growth and Form , 1945 .

[35]  R. Solé Synthetic transitions: towards a new synthesis , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[36]  Paula Kempchinsky,et al.  Evolution and revolution in linguistic theory , 1997 .

[37]  M. Henkel,et al.  Density matrix renormalization group and reaction-diffusion processes , 1999, cond-mat/9902041.

[38]  Noam Chomsky Three Factors in Language Design , 2005, Linguistic Inquiry.

[39]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[40]  Noam Chomsky A minimalist program for linguistic theory , 1992 .

[41]  Proceedings of the IEEE , 2018, IEEE Journal of Emerging and Selected Topics in Power Electronics.

[42]  Jean-Christophe Nebel,et al.  Probabilistic grammatical model for helix‐helix contact site classification , 2013, Algorithms for Molecular Biology.

[43]  Noam Chomsky,et al.  Problems of projection , 2013 .

[44]  J-P Eckmann,et al.  Hierarchical structures induce long-range dynamical correlations in written texts. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Noam Chomsky,et al.  Problems of projection: Extensions , 2015 .

[46]  I. Chuang,et al.  Quantum Computation and Quantum Information: Introduction to the Tenth Anniversary Edition , 2010 .

[47]  Marc Hauser,et al.  Why Only Us , 2016 .

[48]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[49]  F. Wilczek,et al.  Geometric and renormalized entropy in conformal field theory , 1994, hep-th/9403108.

[50]  F. Verstraete,et al.  Renormalization and tensor product states in spin chains and lattices , 2009, 0910.1130.

[51]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[52]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[53]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[54]  U. Schollwoeck The density-matrix renormalization group in the age of matrix product states , 2010, 1008.3477.

[55]  J. Zittartz,et al.  Matrix Product Ground States for One-Dimensional Spin-1 Quantum Antiferromagnets , 1993, cond-mat/9307028.

[56]  Leo P. Kadanoff,et al.  More is the Same; Phase Transitions and Mean Field Theories , 2009, 0906.0653.