Pipeline strategies to accelerate range query processing on a multi-GPU environment

Nowadays, similarity search is becoming a field of increasing interest because these kinds of methods can be applied to different areas in computer science and engineering, such as voice and image recognition, text retrieval, and many others. However, when processing large volumes of data, query response time can be quite high. In this case, it is necessary to apply mechanisms in order to significantly reduce the average query response time. In this sense, the parallelization of the metric structures processing is an interesting field of research. Currently, most of the previous and current works developed in this area are carried out considering classical distributed or shared memory platforms. However, modern GPU/MultiGPU systems offer a very impressive cost/performance ratio as compared to multiprocessor or multicomputer platforms that are usually more expensive gaining in significance and popularity within the scientific computing community. More recently, GPUs have been proposed to evaluate similarity queries for indexes that remains statically stored in GPU’s memory. In this paper we propose two different pipelines to accelerate the process of similarity queries in datasets large enough not to fit in memory of the GPUs. The first pipeline makes use of CPU-cores and GPUs in a hybrid algorithm, and the second one is implemented into the GPU. The results show that the best performance is achieved with both pipelines at the same time.

[1]  Pavel Zezula,et al.  Multi Feature Indexing Network MUFIN for Similarity Search Applications , 2012, SOFSEM.

[2]  Gonzalo Navarro,et al.  An effective clustering algorithm to index high dimensional metric spaces , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[3]  Iraj Kalantari,et al.  A Data Structure and an Algorithm for the Nearest Point Problem , 1983, IEEE Transactions on Software Engineering.

[4]  Ricardo A. Baeza-Yates,et al.  Proximity Matching Using Fixed-Queries Trees , 1994, CPM.

[5]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[6]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[7]  Gonzalo Navarro,et al.  Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching , 2001, Multimedia Tools and Applications.

[8]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[9]  Andrea Esuli,et al.  CoPhIR: a Test Collection for Content-Based Image Retrieval , 2009, ArXiv.

[10]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[11]  Gonzalo Navarro,et al.  Fully dynamic metric access methods based on hyperplane partitioning , 2011, Inf. Syst..

[12]  Gonzalo Navarro,et al.  A compact space decomposition for effective metric indexing , 2005, Pattern Recognit. Lett..

[13]  Ricardo A. Baeza-Yates,et al.  Spaghettis: an array based algorithm for similarity queries in metric spaces , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[14]  Diego Cazorla,et al.  Improving the Performance for the Range Search on Metric Spaces Using a Multi-GPU Platform , 2012, DEXA.

[15]  Nora Reyes,et al.  Similarity Search Using Sparse Pivots for Efficient Multimedia Information Retrieval , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[16]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[17]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[18]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[19]  David Novak,et al.  Generic similarity search engine demonstrated by an image retrieval application , 2009, SIGIR.

[20]  Mauricio Marín,et al.  Range Query Processing in a Multi-GPU Environment , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[21]  Mauricio Marín,et al.  Scheduling Metric-Space Queries Processing on Multi-Core Processors , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.