Reconfigurable Inverted Index

Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decrement after many items have been newly added to a system. We develop a reconfigurable inverted index (Rii) to resolve these two issues. Based on the standard IVFADC system, we design a data layout such that items are stored linearly. This enables us to efficiently run a subset search by switching the search method to a linear PQ scan if the size of a subset is small. Owing to the linear layout, the data structure can be dynamically adjusted after new items are added, maintaining the fast speed of the system. Extensive comparisons show that Rii achieves a comparable performance with state-of-the art systems such as Faiss.

[1]  Leonid Boytsov,et al.  Engineering Efficient and Effective Non-metric Space Library , 2013, SISAP.

[2]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[3]  Patrick Pérez,et al.  Approximate Search with Quantized Sparse Representations , 2016, ECCV.

[4]  Matthijs Douze,et al.  Polysemous Codes , 2016, ECCV.

[5]  Victor S. Lempitsky,et al.  The Inverted Multi-Index , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jing Li,et al.  Efficient Large-Scale Approximate Nearest Neighbor Search on OpenCL FPGA , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Anne-Marie Kermarrec,et al.  Cache locality is not enough: High-Performance Nearest Neighbor Search with Product Quantization Fast Scan , 2015, Proc. VLDB Endow..

[8]  Victor S. Lempitsky,et al.  Tree quantization for large-scale similarity search and classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Victor S. Lempitsky,et al.  Efficient Indexing of Billion-Scale Datasets of Deep Descriptors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[13]  Kiyoharu Aizawa,et al.  PQTable: Nonexhaustive Fast Search for Product-Quantized Codes Using Hash Tables , 2018, IEEE Transactions on Multimedia.

[14]  David J. Fleet,et al.  Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Tomokazu Sato,et al.  What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Heng Tao Shen,et al.  Optimized Cartesian K-Means , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Zhe L. Lin,et al.  Shortlist Selection with Residual-Aware Distance Estimator for K-Nearest Neighbor Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[20]  Laurent Amsaleg,et al.  Prototyping a Web-Scale Multimedia Retrieval Service Using Spark , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[21]  Jingdong Wang,et al.  Composite Quantization for Approximate Nearest Neighbor Search , 2014, ICML.

[22]  Yury Malkov,et al.  Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors , 2018, ECCV.

[23]  Jian Sun,et al.  Joint Inverted Indexing , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Grigorios Tsoumakas,et al.  A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval , 2014, IEEE Transactions on Multimedia.

[25]  James J. Little,et al.  Revisiting Additive Quantization , 2016, ECCV.

[26]  Victor S. Lempitsky,et al.  AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Victor Lempitsky,et al.  Additive Quantization for Extreme Vector Compression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Martin Aumüller,et al.  ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[29]  Matthijs Douze,et al.  Link and Code: Fast Indexing with Graphs and Compact Regression Codes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  K. Kise,et al.  What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search ? , 2013 .

[31]  Yannis Avrithis,et al.  Locally Optimized Product Quantization for Approximate Nearest Neighbor Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Hendrik P. A. Lensch,et al.  Efficient Large-Scale Approximate Nearest Neighbor Search on the GPU , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Anne-Marie Kermarrec,et al.  Accelerated Nearest Neighbor Search with Quick ADC , 2017, ICMR.

[36]  Jinhui Tang,et al.  Sparse composite quantization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Kiyoharu Aizawa,et al.  PQk-means: Billion-scale Clustering for Product-quantized Codes , 2017, ACM Multimedia.

[39]  Zhe L. Lin,et al.  Distance Encoded Product Quantization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  John V. Guttag,et al.  Bolt: Accelerated Data Mining with Fast Vector Compression , 2017, KDD.

[41]  Hong Cheng,et al.  PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization , 2017, CIKM.

[42]  Matthijs Douze,et al.  Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Shin'ichi Satoh,et al.  [Invited Paper] A Survey of Product Quantization , 2018 .

[44]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Vladimir Krylov,et al.  Approximate nearest neighbor algorithm based on navigable small world graphs , 2014, Inf. Syst..