Is Deep Learning a Renormalization Group Flow?

Although there has been a rapid development of practical applications, theoretical explanations of deep learning are in their infancy. Deep learning performs a sophisticated coarse graining. Since coarse graining is a key ingredient of the renormalization group (RG), RG may provide a useful theoretical framework directly relevant to deep learning. In this study we pursue this possibility. A statistical mechanics model for a magnet, the Ising model, is used to train an unsupervised restricted Boltzmann machine (RBM). The patterns generated by the trained RBM are compared to the configurations generated through an RG treatment of the Ising model. Although we are motivated by the connection between deep learning and RG flow, in this study we focus mainly on comparing a single layer of a deep network to a single step in the RG flow. We argue that correlation functions between hidden and visible neurons are capable of diagnosing RG-like coarse graining. Numerical experiments show the presence of RG-like patterns in correlators computed using the trained RBMs. The observables we consider are also able to exhibit important differences between RG and deep learning.

[1]  Zohar Ringel,et al.  Mutual information, neural networks and the renormalization group , 2017, ArXiv.

[2]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[3]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[4]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[5]  Suresh Venkatasubramanian,et al.  Why does Deep Learning work? - A perspective from Group Theory , 2014, ArXiv.

[6]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[7]  Saeed Saremi,et al.  Hierarchical model of natural images and the origin of scale invariance , 2013, Proceedings of the National Academy of Sciences.

[8]  S. Rychkov,et al.  The conformal bootstrap: Theory, numerical techniques, and applications , 2018, Reviews of Modern Physics.

[9]  Nicolas Le Roux,et al.  Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[10]  Masashi Sugiyama,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  K. Wilson,et al.  The Renormalization group and the epsilon expansion , 1973 .

[13]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[14]  Roger G. Melko,et al.  Deep Learning the Ising Model Near Criticality , 2017, J. Mach. Learn. Res..

[15]  Loreto,et al.  Renormalization group approach to the critical behavior of the forest-fire model. , 1995, Physical review letters.

[16]  Roger G. Melko,et al.  Machine learning phases of matter , 2016, Nature Physics.

[17]  G. Evenbly,et al.  Tensor Network Renormalization. , 2014, Physical review letters.

[18]  David J. Schwab,et al.  An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.

[19]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[20]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[21]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[22]  B. Drossel,et al.  Forest fires and other examples of self-organized criticality , 1996, cond-mat/9610201.

[23]  Leo P. Kadanoff,et al.  Real Space Renormalization in Statistical Mechanics , 2013, 1301.6323.

[24]  Paul C. Martin Statistical Physics: Statics, Dynamics and Renormalization , 2000 .

[25]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[26]  P. Howe,et al.  Multicritical points in two dimensions, the renormalization group and the ϵ expansion , 1989 .

[27]  Curtis G. Callan,et al.  Broken scale invariance in scalar field theory , 1970 .

[28]  G. Vidal Entanglement renormalization. , 2005, Physical review letters.

[29]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[30]  Asifullah Khan,et al.  A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.

[31]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[32]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[33]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[34]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  E. M. Opdam,et al.  The two-dimensional Ising model , 2018, From Quarks to Pions.

[36]  Ilya Sutskever,et al.  On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[37]  A. Díaz-Guilera Dynamic Renormalization Group Approach to Self-Organized Critical Phenomena , 1994, cond-mat/9403036.

[38]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[39]  L. Kadanoff,et al.  Variational approximations for renormalization group transformations , 1976 .

[40]  C. Bény Deep learning and the renormalization group , 2013, 1301.3124.

[41]  Zhihua Wei,et al.  Mixed Pooling for Convolutional Neural Networks , 2014, RSKT.

[42]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[43]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[44]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[45]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[46]  Shotaro Shiba Funai,et al.  Thermodynamics and Feature Extraction by Machine Learning , 2018, Physical Review Research.

[47]  Satoshi Iso,et al.  Scale-invariant Feature Extraction of Neural Network and Renormalization Group Flow , 2018, Physical review. E.

[48]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[49]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[50]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[52]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.