Compact Neural Networks based on the Multiscale Entanglement Renormalization Ansatz

This paper demonstrates a method for tensorizing neural networks based upon an efficient way of approximating scale invariant quantum states, the Multi-scale Entanglement Renormalization Ansatz (MERA). We employ MERA as a replacement for the fully connected layers in a convolutional neural network and test this implementation on the CIFAR-10 and CIFAR-100 datasets. The proposed method outperforms factorization using tensor trains, providing greater compression for the same level of accuracy and greater accuracy for the same level of compression. We demonstrate MERA layers with 14000 times fewer parameters and a reduction in accuracy of less than 1% compared to the equivalent fully connected layers, scaling like O(N).

[1]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[2]  M. Lewenstein,et al.  Machine learning by unitary tensor network of hierarchical tree structure , 2017, New Journal of Physics.

[3]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[4]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[5]  Debashree Ghosh,et al.  An Introduction to the Density Matrix Renormalization Group Ansatz in Quantum Chemistry , 2007, 0711.1398.

[6]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7]  F. Mezzadri How to generate random matrices from the classical compact groups , 2006, math-ph/0609050.

[8]  David J. Schwab,et al.  An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  J. Eisert,et al.  Colloquium: Area laws for the entanglement entropy , 2010 .

[12]  Andrzej Cichocki,et al.  Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1 , 2016, ArXiv.

[13]  F. Verstraete,et al.  Matrix product states represent ground states faithfully , 2005, cond-mat/0505140.

[14]  R. Feynman Simulating physics with computers , 1999 .

[15]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[16]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[17]  G. Vidal Class of quantum many-body states that can be efficiently simulated. , 2006, Physical review letters.

[18]  G. Stewart The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators , 1980 .

[19]  Roman Orus,et al.  A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States , 2013, 1306.2164.

[20]  Amnon Shashua,et al.  Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design , 2017, ICLR.

[21]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[22]  David J. Schwab,et al.  Supervised Learning with Quantum-Inspired Tensor Networks , 2016, ArXiv.

[23]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Guifre Vidal,et al.  Entanglement Renormalization: An Introduction , 2009, 0912.1651.

[25]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[26]  Gang Su,et al.  Machine learning by unitary tensor network of hierarchical tree structure , 2017, New Journal of Physics.

[27]  U. Schollwoeck The density-matrix renormalization group in the age of matrix product states , 2010, 1008.3477.

[28]  Alexander Novikov,et al.  Ultimate tensorization: compressing convolutional and FC layers alike , 2016, ArXiv.