Tensor network language model

We propose a new statistical model suitable for machine learning of systems with long distance correlations such as natural languages. The model is based on directed acyclic graph decorated by multi-linear tensor maps in the vertices and vector spaces in the edges, called tensor network. Such tensor networks have been previously employed for effective numerical computation of the renormalization group flow on the space of effective quantum field theories and lattice models of statistical mechanics. We provide explicit algebro-geometric analysis of the parameter moduli space for tree graphs, discuss model properties and applications such as statistical translation.

[1]  Andrew J. Ferris,et al.  Perfect Sampling with Unitary Tensor Networks , 2012, 1201.3974.

[2]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[4]  Ángel J. Gallego,et al.  The physical structure of grammatical correlations: equivalences, formalizations and consequences , 2017, ArXiv.

[5]  David J. Schwab,et al.  An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.

[6]  White,et al.  Density matrix formulation for quantum renormalization groups. , 1992, Physical review letters.

[7]  K. Wilson Renormalization Group and Critical Phenomena. I. Renormalization Group and the Kadanoff Scaling Picture , 1971 .

[8]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[9]  John C. Baez,et al.  Props in Network Theory , 2017, 1707.08321.

[10]  Philip Hackney,et al.  On the Category of Props , 2015, Appl. Categorical Struct..

[11]  M. Irani Vision Day Schedule Time Speaker and Collaborators Affiliation Title a General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms on the Expressive Power of Deep Learning: a Tensor Analysis , 2016 .

[12]  Guifre Vidal,et al.  Entanglement Renormalization: An Introduction , 2009, 0912.1651.

[13]  Max Tegmark,et al.  Criticality in Formal Languages and Statistical Physics∗ , 2017 .

[14]  J. Maldacena The Large-N Limit of Superconformal Field Theories and Supergravity , 1997, hep-th/9711200.

[15]  A. Polyakov,et al.  Gauge Theory Correlators from Non-Critical String Theory , 1998, hep-th/9802109.

[16]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[17]  O. Landon-Cardinal,et al.  Practical variational tomography for critical one-dimensional systems , 2014, 1412.0686.

[18]  W. Bialek,et al.  Are Biological Systems Poised at Criticality? , 2010, 1012.2242.

[19]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part II , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jun Wang,et al.  Unsupervised Generative Modeling Using Matrix Product States , 2017, Physical Review X.

[21]  A. Joyal,et al.  The geometry of tensor calculus, I , 1991 .

[22]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[23]  G. Evenbly,et al.  Tensor Network States and Geometry , 2011, 1106.1082.

[24]  White,et al.  Real-space quantum renormalization groups. , 1992, Physical review letters.

[25]  Li Huang,et al.  Accelerated Monte Carlo simulations with restricted Boltzmann machines , 2016, 1610.02746.

[26]  Dong-Ling Deng,et al.  Exact Machine Learning Topological States , 2016 .

[27]  K. Wilson The renormalization group: Critical phenomena and the Kondo problem , 1975 .

[28]  Yiannis Vlassopoulos,et al.  Language as a matrix product state , 2017, ArXiv.

[29]  E. Witten Anti-de Sitter space and holography , 1998, hep-th/9802150.

[30]  G. Vidal Class of quantum many-body states that can be efficiently simulated. , 2006, Physical review letters.

[31]  G. Vidal,et al.  Classical simulation of quantum many-body systems with a tree tensor network , 2005, quant-ph/0511070.

[32]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[33]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[34]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[35]  Werner Ebeling,et al.  Long-range correlations between letters and sentences in texts , 1995 .

[36]  J. Chen,et al.  Equivalence of restricted Boltzmann machines and tensor network states , 2017, 1701.04831.

[37]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[38]  S. Maclane,et al.  Categorical Algebra , 2007 .

[39]  Eduardo G. Altmann,et al.  On the origin of long-range correlations in texts , 2012, Proceedings of the National Academy of Sciences.

[40]  M. Fisher Renormalization group theory: Its basis and formulation in statistical physics , 1998 .

[41]  C. Bény Deep learning and the renormalization group , 2013, 1301.3124.

[42]  David J. Schwab,et al.  Supervised Learning with Quantum-Inspired Tensor Networks , 2016, ArXiv.

[43]  Matthias Troyer,et al.  Solving the quantum many-body problem with artificial neural networks , 2016, Science.

[44]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[45]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[46]  White,et al.  Density-matrix algorithms for quantum renormalization groups. , 1993, Physical review. B, Condensed matter.

[47]  K. Wilson Renormalization Group and Critical Phenomena. II. Phase-Space Cell Analysis of Critical Behavior , 1971 .

[48]  F. W. Lawvere,et al.  FUNCTORIAL SEMANTICS OF ALGEBRAIC THEORIES. , 1963, Proceedings of the National Academy of Sciences of the United States of America.

[49]  M. Markl,et al.  Wheeled PROPs, graph complexes and the master equation , 2009 .

[50]  L. Kadanoff Scaling laws for Ising models near T(c) , 1966 .

[51]  Michael E. Fisher,et al.  Scaling, universality and renormalization group theory , 1983 .

[52]  M. Lewenstein,et al.  Machine learning by unitary tensor network of hierarchical tree structure , 2017, New Journal of Physics.

[53]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[54]  Max Tegmark,et al.  Critical Behavior in Physics and Probabilistic Formal Languages , 2016, Entropy.

[55]  G. Evenbly,et al.  Simulation of two-dimensional quantum systems using a tree tensor network that exploits the entropic area law , 2009, 0903.5017.

[56]  S. Gubser,et al.  p-Adic AdS/CFT , 2016, 1605.01061.

[57]  Marcelo A. Montemurro,et al.  Long-range fractal correlations in literary corpora , 2002, ArXiv.

[58]  W. Ebeling,et al.  Entropy and Long-Range Correlations in Literary English , 1993, cond-mat/0204108.

[59]  Roger G. Melko,et al.  Learning Thermodynamics with Boltzmann Machines , 2016, ArXiv.

[60]  Sinan Yalin Function spaces and classifying spaces of algebras over a prop , 2015, 1502.01652.

[61]  D. Jaschke,et al.  The Tensor Networks Anthology: Simulation techniques for many-body quantum lattice systems , 2017, SciPost Physics Lecture Notes.

[62]  Amnon Shashua,et al.  Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design , 2017, ICLR.

[63]  Mark W. Johnson,et al.  A Foundation for Props, Algebras, and Modules , 2015 .

[64]  Donald Yau Higher dimensional algebras via colored PROPs , 2008, 0809.2161.