What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search ?

Approximate nearest neighbor search (ANNS) is a basic and important technique used in many tasks such as object recognition. It involves two processes: selecting nearest neighbor candidates and performing a brute-force search of these candidates. Only the former though has scope for improvement. In most existing methods, it approximates the space by quantization. It then calculates all the distances between the query and all the quantized values (e.g., clusters or bit sequences), and selects a fixed number of candidates close to the query. The performance of the method is evaluated based on accuracy as a function of the number of candidates. This evaluation seems rational but poses a serious problem; it ignores the computational cost of the process of selection. In this paper, we propose a new ANNS method that takes into account costs in the selection process. Whereas existing methods employ computationally expensive techniques such as comparative sort and heap, the proposed method does not. This realizes a significantly more efficient search. We have succeeded in reducing computation times by one-third compared with the state-of-theart on an experiment using 100 million SIFT features.

[1]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[2]  Kenneth Rose,et al.  VQ-index: an index structure for similarity searching in multimedia databases , 2002, MULTIMEDIA '02.

[3]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[6]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[10]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[12]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[13]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[14]  Masakazu Iwamura,et al.  Robust and efficient recognition of low-quality images by cascaded recognizers with massive local features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[15]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[16]  Jay Yagnik,et al.  SPEC hashing: Similarity preserving algorithm for entropy-based coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Masakazu Iwamura,et al.  Memory-based recognition of camera-captured characters , 2010, DAS '10.

[18]  Jonathan Brandt,et al.  Transform coding for fast approximate nearest neighbor search in high dimensions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Takuya Kobayashi,et al.  Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features , 2011, 2011 International Conference on Document Analysis and Recognition.

[20]  Nenghai Yu,et al.  Complementary hashing for approximate nearest neighbor search , 2011, 2011 International Conference on Computer Vision.

[21]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[23]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[24]  Masakazu Iwamura,et al.  Memory Reduction for Real-Time Document Image Retrieval with a 20 Million Pages Database , 2011 .

[25]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[26]  Victor S. Lempitsky,et al.  The Inverted Multi-Index , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Wu-Jun Li,et al.  Isotropic Hashing , 2012, NIPS.

[28]  Antonio Torralba,et al.  Multidimensional Spectral Hashing , 2012, ECCV.

[29]  Yongdong Zhang,et al.  Binary Code Ranking with Weighted Hamming Distance , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.