KAMA-NNs: Low-dimensional Rotation Based Neural Networks

We present new architectures for feedforward neural networks built from products of learned or random low-dimensional rotations that offer substantial space compression and computational speedups in comparison to the unstructured baselines. Models using them are also competitive with the baselines and often, due to imposed orthogonal structure, outperform baselines accuracy-wise. We propose to use our architectures in two settings. We show that in the non-adaptive scenario (random neural networks) they lead to asymptotically more accurate, space-efficient and faster estimators of the so-called PNG-kernels (for any activation function defining the PNG). This generalizes several recent theoretical results about orthogonal estimators (e.g. orthogonal JLTs, orthogonal estimators of angular kernels and more). In the adaptive setting we propose efficient algorithms for learning products of low-dimensional rotations and show how our architectures can be used to improve space and time complexity of state of the art reinforcement learning (RL) algorithms (e.g. PPO, TRPO). Here they offer up to 7x compression of the network in comparison to the unstructured baselines and outperform reward-wise state of the art structured neural networks offering similar computational gains and based on low displacement rank matrices.

[1]  M. Kac Foundations of Kinetic Theory , 1956 .

[2]  E. Janvresse Spectral gap for Kac's model of Boltzmann equation , 2001 .

[3]  J. Gallier,et al.  COMPUTING EXPONENTIALS OF SKEW-SYMMETRIC MATRICES AND LOGARITHMS OF ORTHOGONAL MATRICES , 2002 .

[4]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[5]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[6]  Jan Vyb'iral A variant of the Johnson-Lindenstrauss lemma for circulant matrices , 2010, 1002.2847.

[7]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Hui Zhang,et al.  New bounds for circulant Johnson-Lindenstrauss embeddings , 2013, ArXiv.

[9]  S. Mischler,et al.  Kac’s program in kinetic theory , 2013 .

[10]  Gal Chechik,et al.  Coordinate-descent for learning orthogonal matrices through Givens rotations , 2014, ICML.

[11]  Yann LeCun,et al.  Fast Approximation of Rotations and Hessians matrices , 2014, ArXiv.

[12]  Ion Necoara,et al.  Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization , 2013, Journal of Global Optimization.

[13]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[14]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[15]  N. Pillai,et al.  Kac's Walk on $n$-sphere mixes in $n\log n$ steps , 2015, 1507.08554.

[16]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[17]  Sanjiv Kumar,et al.  Binary embeddings with structured hashed projections , 2015, ICML.

[18]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[19]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[20]  Krzysztof Choromanski,et al.  Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[21]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[22]  Krzysztof Choromanski,et al.  The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings , 2017, NIPS.

[23]  Anne Morvan,et al.  Structured adaptive and random spinners for fast machine learning computations , 2016, AISTATS.

[24]  Jeffrey Pennington,et al.  Nonlinear random matrix theory for deep learning , 2019, NIPS.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Surya Ganguli,et al.  The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.

[27]  Byron Boots,et al.  Initialization matters: Orthogonal Predictive State Recurrent Neural Networks , 2018, ICLR.

[28]  Richard E. Turner,et al.  Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[29]  Richard E. Turner,et al.  The Geometry of Random Features , 2018, AISTATS.

[30]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[31]  Jascha Sohl-Dickstein,et al.  Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.

[32]  Samuel S. Schoenholz,et al.  Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.