Approximate similarity search for online multimedia services on distributed CPU–GPU platforms

Similarity search in high-dimensional spaces is a pivotal operation for several database applications, including online content-based multimedia services. With the increasing popularity of multimedia applications, these services are facing new challenges regarding (1) the very large and growing volumes of data to be indexed/searched and (2) the necessity of reducing the response times as observed by end-users. In addition, the nature of the interactions between users and online services creates fluctuating query request rates throughout execution, which requires a similarity search engine to adapt to better use the computation platform and minimize response times. In this work, we address these challenges with Hypercurves, a flexible framework for answering approximate k-nearest neighbor (kNN) queries for very large multimedia databases. Hypercurves executes in hybrid CPU–GPU environments and is able to attain massive query-processing rates through the cooperative use of these devices. Hypercurves also changes its CPU–GPU task partitioning dynamically according to the observed load, aiming for optimal response times. In our empirical evaluation, dynamic task partitioning reduced query response times by approximately 50 % compared to the best static task partition. Due to a probabilistic proof of equivalence to the sequential kNN algorithm, the CPU–GPU execution of Hypercurves in distributed (multi-node) environments can be aggressively optimized, attaining superlinear scalability while still guaranteeing, with high probability, results at least as good as those from the sequential algorithm.

[1]  Regina Berretta,et al.  GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs , 2012, PloS one.

[2]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[3]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  David A. Ross,et al.  Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-by-Example Applications , 2011, ISMIR.

[5]  Gang Hua,et al.  Introduction to the Special Issue on Mobile Vision , 2011, International Journal of Computer Vision.

[6]  Lúcia Maria de A. Drummond,et al.  Anthill: a scalable run-time environment for data mining applications , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[7]  Ümit V. Çatalyürek,et al.  Optimizing dataflow applications on heterogeneous environments , 2010, Cluster Computing.

[8]  Lei Zhao,et al.  A Practical GPU Based KNN Algorithm , 2009 .

[9]  Vittorio Castelli,et al.  Multidimensional Indexing Structures for Content‐Based Retrieval , 2002 .

[10]  Nimrod Megiddo,et al.  EFFICIENT NEAREST NEIGHBOR INDEXING BASED ON A COLLECTION OF SPACE FILLING CURVES , 1997 .

[11]  Juan Carlos Pérez-Cortes,et al.  Approximate Nearest Neighbor Search using a Single Space-filling Curve and Multiple Representations of the Data Points , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[13]  Jun Kong,et al.  Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[14]  Alexandros Stamatakis,et al.  Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems , 2007, Parallel Comput..

[15]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[16]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[17]  Christos Faloutsos,et al.  Gray Codes for Partial Match and Range Queries , 1988, IEEE Trans. Software Eng..

[18]  Wagner Meira,et al.  Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model , 2008, 2008 37th International Conference on Parallel Processing.

[19]  Pen-Chung Yew,et al.  The impact of synchronization and granularity on parallel systems , 1990, ISCA '90.

[20]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21]  Ümit V. Çatalyürek,et al.  Run-time optimizations for replicated dataflows on heterogeneous environments , 2010, HPDC '10.

[22]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[23]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[24]  Virgílio A. F. Almeida,et al.  Capacity Planning for Web Services: Metrics, Models, and Methods , 2001 .

[25]  Barbara Chapman,et al.  Shared Memory Parallel Programming with Open MP , 2005 .

[26]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[27]  Trevor Darrell,et al.  Autotagging Facebook: Social network context improves photo annotation , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[29]  Ciro D'Apice,et al.  Queueing Theory , 2003, Operations Research.

[30]  Dinesh Manocha,et al.  Fast GPU-based locality sensitive hashing for k-nearest neighbor computation , 2011, GIS.

[31]  Cédric du Mouza,et al.  Large-scale indexing of spatial data in distributed repositories: the SD-Rtree , 2009, The VLDB Journal.

[32]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[33]  Nimrod Megiddo,et al.  Fast indexing method for multidimensional nearest-neighbor search , 1998, Electronic Imaging.

[34]  Thomas Hérault,et al.  Performance Portability of a GPU Enabled Factorization with the DAGuE Framework , 2011, 2011 IEEE International Conference on Cluster Computing.

[35]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Ricardo da Silva Torres,et al.  MONORAIL: A Disk-Friendly Index for Huge Descriptor Databases , 2010, 2010 20th International Conference on Pattern Recognition.

[37]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[38]  Shenjian Chen,et al.  Message Passing Interface (MPI) , 2011, Encyclopedia of Parallel Computing.

[39]  C. L. Mallows An inequality involving multinomial probabilities , 1968 .

[40]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[41]  Martin Krulis,et al.  Combining CPU and GPU architectures for fast similarity search , 2012, Distributed and Parallel Databases.

[42]  Conor McBride Clowns to the left of me, jokers to the right (pearl): dissecting data structures , 2008, POPL '08.

[43]  Thomas Seidl,et al.  Signature quadratic form distances for content-based similarity , 2009, ACM Multimedia.

[44]  Alejandro Duran,et al.  Runtime Adjustment of Parallel Nested Loops , 2004, WOMPAT.

[45]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[46]  Philip M. Dixon Nearest Neighbor Methods , 2006 .

[47]  Pradeep Dubey,et al.  Designing and dynamically load balancing hybrid LU for multi/many-core , 2011, Computer Science - Research and Development.

[48]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[49]  Multimedia Indexing , 2008, Encyclopedia of GIS.

[50]  Thomas Seidl,et al.  On Stability of Adaptive Similarity Measures for Content-Based Image Retrieval , 2012, MMM.

[51]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[52]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[53]  Xiaobai Sun,et al.  Parallel search of k-nearest neighbors with synchronous operations , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[54]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[55]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[57]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Man Lung Yiu,et al.  Multi-dimensional top-k dominating queries , 2009, The VLDB Journal.

[59]  Mario A. López,et al.  High dimensional similarity search with space filling curves , 2001, Proceedings 17th International Conference on Data Engineering.

[60]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[61]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[62]  Gagan Agrawal,et al.  Porting irregular reductions on heterogeneous CPU-GPU configurations , 2011, 2011 18th International Conference on High Performance Computing.

[63]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[64]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[65]  Larry L. Peterson,et al.  A dynamic network architecture , 1992, TOCS.

[66]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Ricardo da Silva Torres,et al.  Adaptive parallel approximate similarity search for responsive multimedia retrieval , 2011, CIKM '11.

[68]  Darren J. Kerbyson,et al.  Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[69]  Matti A. Hiltunen,et al.  Coyote: a system for constructing fine-grain configurable communication services , 1998, TOCS.

[70]  Dinesh Manocha,et al.  Efficient nearest-neighbor computation for GPU-based motion planning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[71]  Xuhua Ding,et al.  Efficient processing of exact top-k queries over disk-resident sorted lists , 2010, The VLDB Journal.

[72]  Torsten Suel,et al.  Using graphics processors for high performance IR query processing , 2009, WWW.

[73]  Arthur R. Butz,et al.  Alternative Algorithm for Hilbert's Space-Filling Curve , 1971, IEEE Transactions on Computers.

[74]  Hanan Samet,et al.  Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling) , 2005 .

[75]  Matthieu Cord,et al.  High-dimensional descriptor indexing for large multimedia databases , 2008, CIKM '08.

[76]  Matthieu Cord,et al.  Indexing personal image collections: a flexible, scalable solution , 2010, IEEE Transactions on Consumer Electronics.

[77]  Matthieu Cord,et al.  Fast identification of visual documents using local descriptors , 2008, DocEng '08.

[78]  Thomas Seidl,et al.  Signature Quadratic Form Distance , 2010, CIVR '10.

[79]  Tikara Hosino,et al.  Solving k-Nearest Neighbor Problem on Multiple Graphics Processors , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[80]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[81]  I. Adan,et al.  QUEUEING THEORY , 1978 .

[82]  Santosh Pande,et al.  Input-driven dynamic execution prediction of streaming applications , 2010, PPoPP '10.

[83]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[84]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[85]  Jun Kong,et al.  High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[86]  Metin Nafi Gürcan,et al.  Coordinating the use of GPU and CPU for improving performance of compute intensive applications , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[87]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[88]  Gagan Agrawal,et al.  Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[89]  Ricardo da Silva Torres,et al.  Comparative study of global color and texture descriptors for web image retrieval , 2012, J. Vis. Commun. Image Represent..