Stochastic Flows and Geometric Optimization on the Orthogonal Group

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult $\mathrm{Humanoid}$ agent from $\mathrm{OpenAI}$ $\mathrm{Gym}$ and improving convolutional neural networks.

[1]  George J. Pappas,et al.  A Dynamical Systems Approach to Weighted Graph Matching , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[2]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[3]  K. Brown,et al.  Graduate Texts in Mathematics , 1982 .

[4]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[5]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[6]  Zhangyang Wang,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , 2018, NeurIPS.

[7]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[8]  Rudrasis Chakraborty,et al.  Orthogonal Convolutional Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Silvio Micali,et al.  An O(v|v| c |E|) algoithm for finding maximum matching in general graphs , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[10]  Gal Chechik,et al.  Coordinate-descent for learning orthogonal matrices through Givens rotations , 2014, ICML.

[11]  Richard E. Turner,et al.  Geometrically Coupled Monte Carlo Sampling , 2018, NeurIPS.

[12]  Saket Anand,et al.  Distance Metric Learning by Optimization on the Stiefel Manifold , 2015 .

[13]  Qiang Ye,et al.  Orthogonal Recurrent Neural Networks with Scaled Cayley Transform , 2017, ICML.

[14]  Silvio Lattanzi,et al.  Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.

[15]  Xiaohan Chen,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[16]  Yoshua Bengio,et al.  The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[17]  James Bailey,et al.  Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections , 2016, ICML.

[18]  Krzysztof Choromanski,et al.  KAMA-NNs: Low-dimensional Rotation Based Neural Networks , 2019, AISTATS.

[19]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[20]  Vahab Mirrokni,et al.  Distributed Weighted Matching via Randomized Composable Coresets , 2019, ICML.

[21]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[22]  Richard Peng,et al.  Graph Sparsification, Spectral Sketches, and Faster Resistance Computation, via Short Cycle Decompositions , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  Volkan Cevher,et al.  UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization , 2019, NeurIPS.

[24]  Dacheng Tao,et al.  Orthogonal Deep Neural Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[26]  John J. Leonard,et al.  SE-Sync: A certifiably correct algorithm for synchronization over the special Euclidean group , 2016, Int. J. Robotics Res..

[27]  Elchanan Mossel,et al.  Approximating Partition Functions in Constant Time , 2017, ArXiv.

[28]  Nicholas J. Higham,et al.  The Scaling and Squaring Method for the Matrix Exponential Revisited , 2005, SIAM J. Matrix Anal. Appl..

[29]  Olivier Pietquin,et al.  On Connections between Constrained Optimization and Reinforcement Learning , 2019, ArXiv.

[30]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[31]  Ian Holyer,et al.  The NP-Completeness of Edge-Coloring , 1981, SIAM J. Comput..

[32]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[34]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[35]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[36]  Ernst Hairer,et al.  Important Aspects of Geometric Numerical Integration , 2005, J. Sci. Comput..

[37]  Jean Gallier,et al.  Geometric Methods and Applications: For Computer Science and Engineering , 2000 .

[38]  Krzysztof Onak,et al.  Round compression for parallel matching algorithms , 2017, STOC.

[39]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Xianglong Liu,et al.  Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks , 2017, AAAI.

[42]  Richard E. Turner,et al.  Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[43]  Shiliang Pu,et al.  All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yann LeCun,et al.  Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.