Binary embeddings with structured hashed projections

We consider the hashing mechanism for constructing binary embeddings, that involves pseudo-random projections followed by nonlinear (sign function) mappings. The pseudorandom projection is described by a matrix, where not all entries are independent random variables but instead a fixed "budget of randomness" is distributed across the matrix. Such matrices can be efficiently stored in sub-quadratic or even linear space, provide reduction in randomness usage (i.e. number of required random values), and very often lead to computational speed ups. We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors. To the best of our knowledge, these results are the first that give theoretical ground for the use of general structured matrices in the nonlinear setting. We empirically verify our theoretical findings and show the dependence of learning via structured hashed projections on the performance of neural network as well as nearest neighbor classifier.

[1]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[2]  Minoru Asada,et al.  Initialization and self‐organized optimization of recurrent neural network connectivity , 2009, HFSP journal.

[3]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[4]  Avrim Blum,et al.  Random Projection, Margins, Kernels, and Feature-Selection , 2005, SLSFS.

[5]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[6]  Robert D. Nowak,et al.  Toeplitz Compressed Sensing Matrices With Applications to Sparse Channel Estimation , 2010, IEEE Transactions on Information Theory.

[7]  Sanjiv Kumar,et al.  Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[9]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[10]  H. Sompolinsky,et al.  Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. , 2012, Annual review of neuroscience.

[11]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Aditya Bhaskara,et al.  On Binary Embedding using Circulant Matrices , 2015, J. Mach. Learn. Res..

[13]  Yann LeCun,et al.  Fast Approximations to Structured Sparse Coding and Applications to Object Classification , 2012, ECCV.

[14]  Yaniv Plan,et al.  Dimension Reduction by Random Hyperplane Tessellations , 2014, Discret. Comput. Geom..

[15]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[16]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[17]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[18]  D. Sivakumar,et al.  Algorithmic derandomization via complexity theory , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[19]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[21]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[22]  Constantine Caramanis,et al.  Binary Embedding: Fundamental Limits and Fast Algorithm , 2015, ICML.

[23]  Krzysztof Choromanski,et al.  Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[24]  Guillermo Sapiro,et al.  Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[25]  Anne Morvan,et al.  TripleSpin - a generic compact paradigm for fast machine learning computations , 2016, ArXiv.

[26]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[28]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[29]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[30]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[31]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[32]  Sanjiv Kumar,et al.  Angular Quantization-based Binary Codes for Fast Similarity Search , 2012, NIPS.

[33]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[34]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[35]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.

[36]  Michael B. Wakin,et al.  The Restricted Isometry Property for block diagonal matrices , 2011, 2011 45th Annual Conference on Information Sciences and Systems.

[37]  A. Robert Calderbank,et al.  Efficient and Robust Compressed Sensing Using Optimized Expander Graphs , 2009, IEEE Transactions on Information Theory.

[38]  Krzysztof Choromanski,et al.  Differentially-private learning of low dimensional manifolds , 2016, Theor. Comput. Sci..

[39]  Aicke Hinrichs,et al.  Johnson‐Lindenstrauss lemma for circulant matrices* * , 2010, Random Struct. Algorithms.

[40]  Holger Rauhut,et al.  Suprema of Chaos Processes and the Restricted Isometry Property , 2012, ArXiv.

[41]  WangJun,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012 .

[42]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[43]  Shih-Fu Chang,et al.  Circulant Binary Embedding , 2014, ICML.

[44]  Jan Vyb'iral A variant of the Johnson-Lindenstrauss lemma for circulant matrices , 2010, 1002.2847.

[45]  Justin K. Romberg,et al.  Restricted Isometries for Partial Random Circulant Matrices , 2010, ArXiv.

[46]  Nicolas Pinto,et al.  An Evaluation of the Invariance Properties of a Biologically-Inspired System for Unconstrained Face Recognition , 2010, BIONETICS.

[47]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[48]  Haim Sompolinsky,et al.  Short-term memory in orthogonal neural networks. , 2004, Physical review letters.

[49]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[50]  Laurent Jacques,et al.  Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.

[51]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[52]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[53]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[54]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.