Unifying Orthogonal Monte Carlo Methods

Many machine learning methods making use of Monte Carlo sampling in vector spaces have been shown to be improved by conditioning samples to be mutually orthogonal. Exact orthogonal coupling of samples is computationally intensive, hence approximate methods have been of great interest. In this paper, we present a unifying perspective of many approximate methods by considering Givens transformations, propose new approximate methods based on this framework, and demonstrate the first statistical guarantees for families of approximate methods in kernel approximation. We provide extensive empirical evaluations with guidance for practitioners.

[1]  W. Givens Computation of Plain Unitary Rotations Transforming a General Matrix to Triangular Form , 1958 .

[2]  Evgeny Burnaev,et al.  Quadrature-based features for kernel approximation , 2018, NeurIPS.

[3]  Yann LeCun,et al.  Fast Approximation of Rotations and Hessians matrices , 2014, ArXiv.

[4]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[5]  N. Pillai,et al.  Kac's Walk on $n$-sphere mixes in $n\log n$ steps , 2015, 1507.08554.

[6]  A. Rollett,et al.  The Monte Carlo Method , 2004 .

[7]  Richard E. Turner,et al.  Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[8]  Krzysztof Choromanski,et al.  The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings , 2017, NIPS.

[9]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[10]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[11]  F. Pillichshammer,et al.  Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration , 2010 .

[12]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[13]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[14]  F. Mezzadri How to generate random matrices from the classical compact groups , 2006, math-ph/0609050.

[15]  A. Genz Methods for Generating Random Orthogonal Matrices , 2000 .

[16]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[17]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[18]  Krzysztof Choromanski,et al.  Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[19]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[20]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[21]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[22]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[23]  R. Oliveira On the convergence to equilibrium of Kac’s random walk on matrices , 2007, 0705.2253.

[24]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[25]  Richard E. Turner,et al.  The Geometry of Random Features , 2018, AISTATS.