The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings

We examine a class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction and kernel approximation. For both the Johnson-Lindenstrauss transform and the angular kernel, we show that we can select matrices yielding guaranteed improved performance in accuracy and/or speed compared to earlier methods. We introduce matrices with complex entries which give significant further accuracy improvement. We provide geometric and Markov chain-based perspectives to help understand the benefits, and empirical results which suggest that the approach is helpful in a wider range of applications.

[1]  Sanjiv Kumar,et al.  Binary embeddings with structured hashed projections , 2015, ICML.

[2]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[3]  Aicke Hinrichs,et al.  Johnson‐Lindenstrauss lemma for circulant matrices* * , 2010, Random Struct. Algorithms.

[4]  Matthew Sharifi,et al.  Large-scale speaker identification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[6]  Hui Zhang,et al.  New bounds for circulant Johnson-Lindenstrauss embeddings , 2013, ArXiv.

[7]  Yves-Laurent Kom Samo,et al.  Generalized Spectral Kernels , 2015, 1506.02236.

[8]  Shih-Fu Chang,et al.  Fast Orthogonal Projection Based on Kronecker Product , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[10]  Krzysztof Choromanski,et al.  Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[11]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[14]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Anne Morvan,et al.  Structured adaptive and random spinners for fast machine learning computations , 2016, AISTATS.

[16]  Jan Vyb'iral A variant of the Johnson-Lindenstrauss lemma for circulant matrices , 2010, 1002.2847.

[17]  Grigori Sidorov,et al.  Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model , 2014, Computación y Sistemas.

[18]  Pradeep Dubey,et al.  Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing , 2013, Proc. VLDB Endow..

[19]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[20]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[21]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.