Large-Scale Approximate k-NN Graph Construction on GPU

k-nearest neighbor graph is a key data structure in many disciplines such as manifold learning, machine learning and information retrieval, etc. NN-Descent was proposed as an effective solution for the graph construction problem. However, it cannot be directly transplanted to GPU due to the intensive memory accesses required in the approach. In this paper, NN-Descent has been redesigned to adapt to the GPU architecture. In particular, the number of memory accesses has been reduced significantly. The redesign fully exploits the parallelism of the GPU hardware. In the meantime, the genericness as well as the simplicity of NN-Descent are well-preserved. In addition, a simple but effective k-NN graph merge approach is presented. It allows two graphs to be merged efficiently on GPUs. More importantly, it makes the construction of high-quality k-NN graphs for out-of-GPU-memory datasets tractable. The results show that our approach is 100-250× faster than single-thread NN-Descent and is 2.5-5× faster than existing GPU-based approaches.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Jing Wang,et al.  Scalable k-NN graph construction for visual descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Dinesh Manocha,et al.  Fast GPU-based locality sensitive hashing for k-nearest neighbor computation , 2011, GIS.

[4]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[5]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[6]  Hendrik P. A. Lensch,et al.  Efficient Large-Scale Approximate Nearest Neighbor Search on the GPU , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[11]  Guoyang Chen,et al.  Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[12]  Frank Nielsen,et al.  K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching , 2010, 2010 IEEE International Conference on Image Processing.

[13]  Deng Cai,et al.  EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph , 2016, ArXiv.

[14]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[15]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[18]  Victor S. Lempitsky,et al.  Efficient Indexing of Billion-Scale Datasets of Deep Descriptors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[20]  Ping Li,et al.  SONG: Approximate Nearest Neighbor Search on GPU , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[21]  Kaizhu Huang,et al.  Fast kNN Graph Construction with Locality Sensitive Hashing , 2013, ECML/PKDD.

[22]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[23]  H. Lensch,et al.  GGNN: Graph-Based GPU Nearest Neighbor Search , 2019, IEEE Transactions on Big Data.

[24]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.