Permutation Index and GPU to Solve efficiently Many Queries

Similarity search is a fundamental operation for applications that deal with multimedia data. For a query in a multimedia database it is meaningless to look for elements exactly equal to a given one as query. Instead, we need to measure the similarity (or dissimilarity) between the query object and each object of the database. The similarity search problem can be formally defined through the concept of metric space, which provides a formal framework that is independent of the application domain. In a metric database, the objects from a metric space can be stored and similarity queries about them can be efficiently answered. In general, the search efficiency is understood as minimizing the number of distance calculations required to answer them. Therefore, the goal is to preprocess the dataset by building an index, such that queries can be answered with as few distance computations as possible. However, with very large metric databases is not enough to preprocess the dataset by building an index, it is also necessary to speed up the queries by using high performance computing, as GPU. In this work we show an implementation of a pure GPU architecture to build the Pemutation Index, used for approximate similarity search on databases of different data nature. Our proposal is able to solve many queries at the same time.

[1]  Mauricio Marín,et al.  kNN Query Processing in Metric Spaces Using GPUs , 2011, Euro-Par.

[2]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[3]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[4]  Gonzalo Navarro,et al.  Speeding up spatial approximation search in metric spaces , 2009, JEAL.

[5]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[6]  Marco Patella,et al.  Approximate and probabilistic methods , 2010, SIGSPACIAL.

[7]  Gonzalo Navarro,et al.  Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order , 2005, MICAI.

[8]  Robert M. Farber,et al.  CUDA Application Design and Development , 2011 .

[9]  Frank Nielsen,et al.  K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching , 2010, 2010 IEEE International Conference on Image Processing.

[10]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[11]  Luisa Micó,et al.  A modification of the LAESA algorithm for approximated k-NN classification , 2003, Pattern Recognit. Lett..

[12]  Gonzalo Navarro,et al.  Probabilistic proximity searching algorithms based on compact partitions , 2004, J. Discrete Algorithms.

[13]  Diego Cazorla,et al.  A GPU-Based Implementation for Range Queries on Spaghettis Data Structure , 2011, ICCSA.

[14]  Liheng Jian,et al.  Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU , 2010, 2010 IEEE 2nd Symposium on Web Society.

[15]  María Fabiana Piccoli,et al.  Efficient similarity search on multimedia databases , 2012 .

[16]  Tikara Hosino,et al.  Solving k-Nearest Neighbor Problem on Multiple Graphics Processors , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[17]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[18]  Richard C. Singleton Algorithm 347: an efficient algorithm for sorting with minimal storage [M1] , 1969, CACM.