Efficient Similarity Search in Metric Spaces with Cluster Reduction

Clustering-based methods for searching in metric spaces partition the space into a set of disjoint clusters. When solving a query, some clusters are discarded without comparing them with the query object, and clusters that can not be discarded are searched exhaustively. In this paper we propose a new strategy and algorithms for clustering-based methods that avoid the exhaustive search within clusters that can not be discarded, at the cost of some extra information in the index. This new strategy is based on progressively reducing the cluster until it can be discarded from the result. We refer to this approach as cluster reduction. We present the algorithms for range and kNN search. The results obtained in an experimental evaluation with synthetic and real collections show that the search cost can be reduced by a 13% - 25% approximately with respect to existing methods.

[1]  Václav Snásel,et al.  PM-tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases , 2004, ADBIS.

[2]  Iraj Kalantari,et al.  A Data Structure and an Algorithm for the Nearest Point Problem , 1983, IEEE Transactions on Software Engineering.

[3]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[4]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[5]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[6]  Meir Cohen Similarity Search , 2008, Encyclopedia of GIS.

[7]  Gonzalo Navarro,et al.  A compact space decomposition for effective metric indexing , 2005, Pattern Recognit. Lett..

[8]  F. DEHNE,et al.  Voronoi trees and clustering problems , 1987, Inf. Syst..

[9]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[10]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[11]  David Novak,et al.  Metric Index: An efficient and scalable solution for precise and approximate similarity search , 2011, Inf. Syst..

[12]  Gonzalo Navarro Searching in metric spaces by spatial approximation , 2002, The VLDB Journal.

[13]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[14]  Václav Snásel,et al.  Nearest Neighbours Search Using the PM-Tree , 2005, DASFAA.

[15]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[16]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[17]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[18]  Peter C. Lockemann,et al.  Advances in Database Technology — EDBT 2000 , 2000, Lecture Notes in Computer Science.