Accelerating approximate similarity queries using genetic algorithms

Searching for the exact answer to a similarity query is an expensive process considering computational resources, such as memory and processing time requirements. Moreover, comparison operations over multimedia data is even more expensive than over traditional data such as numbers and small character strings. Therefore, when comparing multimedia data, the comparison computations usually consider some properties extracted from the data elements. In this way, exact queries involving this kind of data return data that is exact regarding the properties compared, but not necessarily exact regarding the multimedia data itself. For example, searching for similar images regarding their colors return images whose color histogram are the most similar, but the retrieved images can be very different regarding, for instance, the shape the objects pictured. Therefore, for applications dealing with complex data types, trading exact answering with query time response can be worthwhile. In this paper we propose to use techniques based on genetic algorithms to allow retrieving data indexed in a metric access methods within a limited, user-defined, amount of time. We show that these techniques lead to much faster execution, without reducing the quality of the answer. We also present experimental evaluation using real datasets, showing that suitable results can be obtained in a fraction of the time required to obtain the exact answer.

[1]  Marco Patella,et al.  PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[3]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[4]  Jonathan Goldstein,et al.  Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches , 2000, VLDB.

[5]  Christos Faloutsos,et al.  Indexing of Multimedia Data , 1997, Multimedia Databases in Perspective.

[6]  William Perrizo,et al.  Rapid and Accurate Density Clustering Analysis for High Dimensional Data , 2004, IASSE.

[7]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[8]  Dimitris Papadias,et al.  Approximate Processing of Multiway Spatial Joins in Very Large Databases , 2002, EDBT.

[9]  Pavel Zezula,et al.  Approximate similarity retrieval with M-trees , 1998, The VLDB Journal.

[10]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[11]  Michael Vassilakopoulos,et al.  Approximate Algorithms for Distance-Based Queries in High-Dimensional Data Spaces Using R-Trees , 2002, ADBIS.

[12]  Carlo Tomasi,et al.  Perceptual metrics for image database navigation , 1999 .

[13]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..

[14]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.