论文信息 - The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings

The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings

We examine a class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction and kernel approximation. For both the Johnson-Lindenstrauss transform and the angular kernel, we show that we can select matrices yielding guaranteed improved performance in accuracy and/or speed compared to earlier methods. We introduce matrices with complex entries which give significant further accuracy improvement. We provide geometric and Markov chain-based perspectives to help understand the benefits, and empirical results which suggest that the approach is helpful in a wider range of applications.

[1] Sanjiv Kumar,et al. Binary embeddings with structured hashed projections , 2015, ICML.

[2] Sanjiv Kumar,et al. Orthogonal Random Features , 2016, NIPS.

[3] Aicke Hinrichs,et al. Johnson‐Lindenstrauss lemma for circulant matrices* * , 2010, Random Struct. Algorithms.

[4] Matthew Sharifi,et al. Large-scale speaker identification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Christopher K. I. Williams. Computation with Infinite Neural Networks , 1998, Neural Computation.

[6] Hui Zhang,et al. New bounds for circulant Johnson-Lindenstrauss embeddings , 2013, ArXiv.

[7] Yves-Laurent Kom Samo,et al. Generalized Spectral Kernels , 2015, 1506.02236.

[8] Shih-Fu Chang,et al. Fast Orthogonal Projection Based on Kronecker Product , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[10] Krzysztof Choromanski,et al. Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[11] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[12] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[13] Bernard Chazelle,et al. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[14] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Anne Morvan,et al. Structured adaptive and random spinners for fast machine learning computations , 2016, AISTATS.

[16] Jan Vyb'iral. A variant of the Johnson-Lindenstrauss lemma for circulant matrices , 2010, 1002.2847.

[17] Grigori Sidorov,et al. Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model , 2014, Computación y Sistemas.

[18] Pradeep Dubey,et al. Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing , 2013, Proc. VLDB Endow..

[19] Alexander J. Smola,et al. Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[20] Alexandr Andoni,et al. Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[21] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.