Fast Nearest Neighbor Search Based on Approximate k-NN Graph

Nearest neighbor search is known as a challenging issue that has been studied for several decades. Recently, this issue becomes more and more imminent in viewing that the big data problem arises from various fields. According to recent study, graph-based methods are effective to address this issue. However, the k-nearest neighbor graph construction of the existing solutions is computationally inefficient, which becomes the processing bottleneck. To address this issue, a novel k-nearest neighbor graph construction method is proposed. As indicated by extensive experiments, satisfactory performance is achieved on different datasets while the graph construction cost has been reduced to low level. In addition, a comparative study on various approximate nearest neighbor search methods, such as the space partitioning, compressional, hash and the graph-based is presented. As indicated in the experiment, our method makes the best trade-off between search quality and computational cost.

[1]  Sanjoy Dasgupta,et al.  Which Spatial Partition Trees are Adaptive to Intrinsic Dimension? , 2009, UAI.

[2]  Jing Wang,et al.  Scalable k-NN graph construction for visual descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[4]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[5]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[7]  Rong Gu,et al.  Large Scale Nearest Neighbors Search Based on Neighborhood Graph , 2013, 2013 International Conference on Advanced Cloud and Big Data.

[8]  Victor Lempitsky,et al.  Additive Quantization for Extreme Vector Compression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  James J. Little,et al.  Stacked Quantizers for Compositional Vector Compression , 2014, ArXiv.

[10]  Cheng Wang,et al.  Approximate Nearest Neighbor Search by Residual Vector Quantization , 2010, Sensors.

[11]  David J. Fleet,et al.  Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jingdong Wang,et al.  Composite Quantization for Approximate Nearest Neighbor Search , 2014, ICML.

[13]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[14]  Shipeng Li,et al.  Query-driven iterated neighborhood graph search for large scale indexing , 2012, ACM Multimedia.

[15]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[17]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[18]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[20]  Matthijs Douze,et al.  Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).