An efficiency study of a pivot-based algorithm for similarity search on a heterogeneous platform

Graphics processing units have definitely consolidated a position of privilege in the acceleration of general purpose applications. Search algorithms in large databases are a clear example of applications that benefit from computing platforms based on these devices. To obtain an efficient implementation of a given code using these platforms, it is very important to take into account their features. However, the characteristics of the application and certain overheads still introduced by these platforms, make it not always advisable to use these devices to obtain significant time reductions. In this paper, we show how different properties of current graphics processing units are exploited for improving a version of the general metric structure similarity search algorithm introduced by the authors, and compare it with a multithreaded version of the same algorithm using conventional processors. The analysis of the results provides us relevant data to determine the most appropriate computing platform.

[1]  Carlos A. Coello Coello,et al.  Swarm Intelligence for Multi-objective Problems in Data Mining , 2009 .

[2]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[3]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Ricardo A. Baeza-Yates,et al.  Spaghettis: an array based algorithm for similarity queries in metric spaces , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[5]  Magnus Lie Hetland The Basic Principles of Metric Indexing , 2009 .

[6]  Diego Cazorla,et al.  Similarity search implementations for multi-core and many-core processors , 2011, 2011 International Conference on High Performance Computing & Simulation.

[7]  Diego Cazorla,et al.  Towards an efficient static scheduling scheme for delivering queries to heterogeneous clusters in the similarity search problem , 2013, The Journal of Supercomputing.

[8]  Nieves R. Brisaboa,et al.  Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces , 2007, SOFSEM.

[9]  Lei Zhao,et al.  A Practical GPU Based KNN Algorithm , 2009 .

[10]  Martin Krulis,et al.  Combining CPU and GPU architectures for fast similarity search , 2012, Distributed and Parallel Databases.

[11]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[12]  Ricardo A. Baeza-Yates,et al.  Proximity Matching Using Fixed-Queries Trees , 1994, CPM.

[13]  Andrea Esuli,et al.  CoPhIR: a Test Collection for Content-Based Image Retrieval , 2009, ArXiv.

[14]  Diego Cazorla,et al.  Improving the Performance for the Range Search on Metric Spaces Using a Multi-GPU Platform , 2012, DEXA.

[15]  Gonzalo Navarro,et al.  Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching , 2001, Multimedia Tools and Applications.

[16]  Mauricio Marín,et al.  Range query processing on single and multi GPU environments , 2013, Comput. Electr. Eng..