Fast approximate k-nearest neighbours search using GPGPU

The k-nearest neighbours (k-NN) search is one of the most critical non-parametric methods used in data retrieval and similarity tasks. Over recent years, fast k-NN processing for large amount of high-dimensional data is increasingly demanded. Locality-sensitive hashing is a viable solution for computing fast approximate nearest neighbours (ANN) with reasonable accuracy. This chapter presents a novel parallelisation of the locality-sensitive hashing method using GPGPU, where the multi-probe variant is considered. The method was implemented using CUDA platform for constructing a k-ANN graph. It was compared to the state-of-the-art CPU-based k-ANN and two GPU-based k-NN methods on large and multidimensional data set. The experimental results showed that the proposed method has a speed-up of 30× or higher, in comparison to the CPU-based approximate method, whilst retaining a high recall rate.

[1]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Mauricio Marín,et al.  kNN Query Processing in Metric Spaces Using GPUs , 2011, Euro-Par.

[3]  Karl Aberer,et al.  LSH At Large - Distributed KNN Search in High Dimensions , 2008, WebDB.

[4]  Ignacio Blanquer,et al.  A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid , 2006, VECPAR.

[5]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[6]  Wei-Kuan Shih,et al.  Efficient Parallel Algorithm for Nonlinear Dimensionality Reduction on GPU , 2010, 2010 IEEE International Conference on Granular Computing.

[7]  Liheng Jian,et al.  CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU , 2009, 2009 IEEE Youth Conference on Information, Computing and Telecommunication.

[8]  J. Ian Munro,et al.  Deterministic skip lists , 1992, SODA '92.

[9]  João Marcelo X. N. Teixeira,et al.  Massively Parallel Nearest Neighbor Queries for Dynamic Point Clouds on the GPU , 2009, 2009 21st International Symposium on Computer Architecture and High Performance Computing.

[10]  Karl Aberer,et al.  Distributed similarity search in high dimensions using locality sensitive hashing , 2009, EDBT '09.

[11]  Yunjun Gao,et al.  Efficient Parallel Processing for K-Nearest-Neighbor Search in Spatial Databases , 2006, ICCSA.

[12]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[13]  Rodney A. Kennedy,et al.  Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices , 2007 .

[14]  Andreas Nüchter,et al.  GPU-Accelerated Nearest Neighbor Search for 3D Registration , 2009, ICVS.

[15]  Dinesh Manocha,et al.  Fast GPU-based locality sensitive hashing for k-nearest neighbor computation , 2011, GIS.

[16]  Xiaobai Sun,et al.  Parallel search of k-nearest neighbors with synchronous operations , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[17]  M. Slaney,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[18]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[19]  Frank Nielsen,et al.  K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching , 2010, 2010 IEEE International Conference on Image Processing.

[20]  Junfeng He,et al.  Optimal Parameters for Locality-Sensitive Hashing , 2012, Proceedings of the IEEE.

[21]  Regina Berretta,et al.  GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs , 2012, PloS one.

[22]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[23]  Zhe Wang,et al.  Modeling LSH for performance tuning , 2008, CIKM '08.

[24]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[25]  Dinesh Manocha,et al.  Efficient nearest-neighbor computation for GPU-based motion planning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[27]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[29]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[30]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[31]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.