论文信息 - KAMA-NNs: Low-dimensional Rotation Based Neural Networks

KAMA-NNs: Low-dimensional Rotation Based Neural Networks

We present new architectures for feedforward neural networks built from products of learned or random low-dimensional rotations that offer substantial space compression and computational speedups in comparison to the unstructured baselines. Models using them are also competitive with the baselines and often, due to imposed orthogonal structure, outperform baselines accuracy-wise. We propose to use our architectures in two settings. We show that in the non-adaptive scenario (random neural networks) they lead to asymptotically more accurate, space-efficient and faster estimators of the so-called PNG-kernels (for any activation function defining the PNG). This generalizes several recent theoretical results about orthogonal estimators (e.g. orthogonal JLTs, orthogonal estimators of angular kernels and more). In the adaptive setting we propose efficient algorithms for learning products of low-dimensional rotations and show how our architectures can be used to improve space and time complexity of state of the art reinforcement learning (RL) algorithms (e.g. PPO, TRPO). Here they offer up to 7x compression of the network in comparison to the unstructured baselines and outperform reward-wise state of the art structured neural networks offering similar computational gains and based on low displacement rank matrices.

[1] M. Kac. Foundations of Kinetic Theory , 1956 .

[2] E. Janvresse. Spectral gap for Kac's model of Boltzmann equation , 2001 .

[3] J. Gallier,et al. COMPUTING EXPONENTIALS OF SKEW-SYMMETRIC MATRICES AND LOGARITHMS OF ORTHOGONAL MATRICES , 2002 .

[4] Bernard Chazelle,et al. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[5] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[6] Jan Vyb'iral. A variant of the Johnson-Lindenstrauss lemma for circulant matrices , 2010, 1002.2847.

[7] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] Hui Zhang,et al. New bounds for circulant Johnson-Lindenstrauss embeddings , 2013, ArXiv.

[9] S. Mischler,et al. Kac’s program in kinetic theory , 2013 .

[10] Gal Chechik,et al. Coordinate-descent for learning orthogonal matrices through Givens rotations , 2014, ICML.

[11] Yann LeCun,et al. Fast Approximation of Rotations and Hessians matrices , 2014, ArXiv.

[12] Ion Necoara,et al. Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization , 2013, Journal of Global Optimization.

[13] Alexandr Andoni,et al. Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[14] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[15] N. Pillai,et al. Kac's Walk on $n$-sphere mixes in $n\log n$ steps , 2015, 1507.08554.

[16] Tara N. Sainath,et al. Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[17] Sanjiv Kumar,et al. Binary embeddings with structured hashed projections , 2015, ICML.

[18] Sanjiv Kumar,et al. Orthogonal Random Features , 2016, NIPS.

[19] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.