Approximate nearest neighbors via dictionary learning

Approximate Nearest Neighbors (ANN) in high dimensional vector spaces is a fundamental, yet challenging problem in many areas of computer science, including computer vision, data mining and robotics. In this work, we investigate this problem from the perspective of compressive sensing, especially the dictionary learning aspect. High dimensional feature vectors are seldom seen to be sparse in the feature domain; examples include, but not limited to Scale Invariant Feature Transform (SIFT) descriptors, Histogram Of Gradients, Shape Contexts, etc. Compressive sensing advocates that if a given vector has a dense support in a feature space, then there should exist an alternative high dimensional subspace where the features are sparse. This idea is leveraged by dictionary learning techniques through learning an overcomplete projection from the feature space so that the vectors are sparse in the new space. The learned dictionary aids in refining the search for the nearest neighbors to a query feature vector into the most likely subspace combination indexed by its non-zero active basis elements. Since the size of the dictionary is generally very large, distinct feature vectors are most likely to have distinct non-zero basis. Utilizing this observation, we propose a novel representation of the feature vectors as tuples of non-zero dictionary indices, which then reduces the ANN search problem into hashing the tuples to an index table; thereby dramatically improving the speed of the search. A drawback of this naive approach is that it is very sensitive to feature perturbations. This can be due to two possibilities: (i) the feature vectors are corrupted by noise, (ii) the true data vectors undergo perturbations themselves. Existing dictionary learning methods address the first possibility. In this work we investigate the second possibility and approach it from a robust optimization perspective. This boils down to the problem of learning a dictionary robust to feature perturbations, viz. paving the way for a novel Robust Dictionary Learning (RDL) framework. In addition to the above model, we also propose a novel LASSO based multi-regularization hashing algorithm which utilizes the consistency properties of the non-zero active basis for increasing values of the regularization weights. Even though our algorithm is generic and has wide coverage in different areas of scientific computing, the experiments in the current work are mainly focused towards improving the speed and accuracy of ANN for SIFT descriptors, which are high-dimensional (128D) and are one of the most widely used interest point detectors in computer vision. Preliminary results from SIFT datasets show that our algorithm is far superior to the state-of-the-art techniques in ANN.

[1]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[2]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[3]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[4]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[7]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[8]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[9]  Danny C. Sorensen,et al.  Algorithm 873: LSTRS: MATLAB software for large-scale trust-region subproblems and regularization , 2008, TOMS.

[10]  Michael Elad,et al.  Image Denoising Via Learned Dictionaries and Sparse representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[14]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Arkadi Nemirovski,et al.  Robust solutions of uncertain linear programs , 1999, Oper. Res. Lett..

[16]  Vassilios Morellas,et al.  Autonomous altitude estimation of a UAV using a single onboard camera , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[18]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[19]  Guillermo Sapiro,et al.  Hierarchical dictionary learning for invariant classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[21]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[22]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[23]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[24]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[25]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[26]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[27]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Narendra Ahuja,et al.  Hybrid Compressive Sampling via a New Total Variation TVL1 , 2010, ECCV.

[29]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[30]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[31]  Roland Miezianko,et al.  Dictionary learning for robust background modeling , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[33]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[34]  Emmanuel J. Candès,et al.  Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions , 2004, Found. Comput. Math..

[35]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[36]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[38]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[39]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[40]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[41]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[42]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[43]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[44]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.