Online multimedia retrieval on CPU-GPU platforms with adaptive work partition

Abstract Nearest neighbors search is a core operation found in several online multimedia services. These services have to handle very large databases, while, at the same time, they must minimize the query response times observed by users. This is specially complex because those services deal with fluctuating query workloads (rates). Consequently, they must adapt at run-time to minimize the response times as the load varies. In this paper, we address the aforementioned challenges with a distributed memory parallelization of the product quantization nearest neighbor search, also known as IVFADC, for hybrid CPU–GPU machines. Our parallel IVFADC implements an out-of-GPU memory execution scheme to use the GPU for databases in which the index does not fit in its memory, which is crucial for searching in very large databases. The careful use of CPU and GPU with work stealing led to an average response time reduction of 2.4 × as compared to using the GPU only. Also, our approach to adapt the system to fluctuating loads, called Dynamic Query Processing Policy (DQPP), attained a response time reduction of up to 5 × vs. the best static (BS) policy for moderate loads. The system has attained high query processing rates and near-linear scalability in all experiments. We have evaluated our system on a machine with up to 256 NVIDIA V100 GPUs processing a database of 256 billion SIFT features vectors.

[1]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[2]  Dinesh Manocha,et al.  Fast GPU-based locality sensitive hashing for k-nearest neighbor computation , 2011, GIS.

[3]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[4]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[5]  Parikshit Ram,et al.  Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions , 2009, NIPS.

[6]  Anshumali Shrivastava,et al.  Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search , 2017, SIGMOD Conference.

[7]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[8]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[9]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Sebastian Michel,et al.  RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce , 2010, LSDS-IR@SIGIR.

[11]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[12]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[13]  Michael Gowanlock,et al.  KNN-Joins Using a Hybrid Approach: Exploiting CPU/GPU Workload Characteristics , 2018, GPGPU@ASPLOS.

[14]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[15]  Victor S. Lempitsky,et al.  The Inverted Multi-Index , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Hans-Arno Jacobsen,et al.  A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms , 2016, SIGMOD Conference.

[17]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[18]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Joel H. Saltz,et al.  Approximate similarity search for online multimedia services on distributed CPU–GPU platforms , 2012, The VLDB Journal.

[20]  Martin Krulis,et al.  Combining CPU and GPU architectures for fast similarity search , 2012, Distributed and Parallel Databases.

[21]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[22]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[23]  Laurent Amsaleg,et al.  Indexing and searching 100M images with map-reduce , 2013, ICMR.

[24]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[25]  Michael Gowanlock,et al.  Accelerating the similarity self-join using the GPU , 2019, J. Parallel Distributed Comput..

[26]  George Teodoro,et al.  Large-scale parallel similarity search with Product Quantization for online multimedia services , 2019, J. Parallel Distributed Comput..

[27]  Ashish Goel,et al.  Efficient distributed locality sensitive hashing , 2012, CIKM.

[28]  Virgílio A. F. Almeida,et al.  Capacity Planning for Web Services: Metrics, Models, and Methods , 2001 .

[29]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[31]  Pawel Czarnul,et al.  Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications , 2017, The Journal of Supercomputing.

[32]  Matthijs Douze,et al.  Polysemous Codes , 2016, ECCV.

[33]  Muhammad A. Awad,et al.  Engineering a high-performance GPU B-Tree , 2019, PPoPP.

[34]  Jinwoong Kim,et al.  Co-processing heterogeneous parallel index for multi-dimensional datasets , 2018, J. Parallel Distributed Comput..

[35]  Jinwoong Kim,et al.  Parallel multi-dimensional range query processing with R-trees on GPU , 2013, J. Parallel Distributed Comput..