A flexible framework to ease nearest neighbor search in multidimensional data spaces

Abstract Similarity search is a very active area of research because of its usefulness in a set of modern applications, such as content-based image retrieval (CBIR), time series, spatial databases, data mining and multimedia databases in general. The usual way to do a similarity search is to map the objects to feature vectors and to model the search as a nearest neighbor query in the multidimensional space where vectors reside. The main critical issues to this process are: the distance function used to measure the proximity between vectors and the index method to accelerate the search. In this paper we propose a formal framework to perform similarity search that provides the user with a high degree of freedom in the choice of both the distance and the index structure used to organize the feature space. More specifically, we introduce a function to approximate eventually any distance function that can be used in conjunction with index structures that divide the feature space in multidimensional rectangular regions. Cases of use and experimental work are presented to demonstrate the applicability and the overhead of the framework.

[1]  Baitao Li Chang,et al.  DPF - a perceptual distance function for image retrieval , 2002, Proceedings. International Conference on Image Processing.

[2]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[3]  V. P. Subramanyam Rallabandi,et al.  Image retrieval system using R-tree self-organizing map , 2007, Data Knowl. Eng..

[4]  Xiang Lian,et al.  Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding , 2008, IEEE Transactions on Knowledge and Data Engineering.

[5]  Faruq A. Al-Omari,et al.  Query by image and video content: a colored-based stochastic model approach , 2005, Data Knowl. Eng..

[6]  Christos Faloutsos,et al.  Efficient processing of complex similarity queries in RDBMS through query rewriting , 2006, CIKM '06.

[7]  Ingo Schmitt,et al.  Filter ranking in high-dimensional space , 2006, Data Knowl. Eng..

[8]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[9]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[10]  Ryoji Kataoka,et al.  Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation , 2001, VLDB.

[11]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[12]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[13]  Jun Sakuma,et al.  Fast approximate similarity search in extremely high-dimensional data sets , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Elke Achtert,et al.  Efficient reverse k-nearest neighbor search in arbitrary metric spaces , 2006, SIGMOD Conference.

[15]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[16]  Vladimir Pestov,et al.  Indexing Schemes for Similarity Search: an Illustrated Paradigm , 2002, Fundam. Informaticae.

[17]  Charu C. Aggarwal,et al.  Re-designing distance functions and distance-based applications for high dimensional data , 2001, SGMD.

[18]  Deok-Hwan Kim,et al.  Similarity search for multidimensional data sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Kenneth C. Sevcik,et al.  Quantization Techniques for Similarity Search in High-Dimensional Data Spaces , 2003, BNCOD.

[20]  Eli Upfal,et al.  Finding near neighbors through cluster pruning , 2007, PODS '07.

[21]  Ramesh C. Jain,et al.  Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[22]  Hans-Peter Kriegel,et al.  Improving Adaptable Similarity Query Processing by Using Approximations , 1998, VLDB.

[23]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[24]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[25]  Pavel Zezula,et al.  Distance browsing in distributed multimedia databases , 2009, Future Gener. Comput. Syst..

[26]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[27]  Chin-Wan Chung,et al.  The GC-tree: a high-dimensional index structure for similarity search in image databases , 2002, IEEE Trans. Multim..

[28]  Diego Reforgiato Recupero,et al.  Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces , 2005, IEEE Transactions on Knowledge and Data Engineering.

[29]  Divyakant Agrawal,et al.  High dimensional nearest neighbor searching , 2006, Inf. Syst..

[30]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[31]  Manuel Barrena García,et al.  Efficient Similarity Search in Feature Spaces with the Q-Tree , 2002, ADBIS.

[32]  David Novak,et al.  MESSIF: Metric Similarity Search Implementation Framework , 2007, DELOS.

[33]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[34]  Pavel Zezula,et al.  Region proximity in metric spaces and its use for approximate similarity search , 2003, TOIS.

[35]  Zaher Al Aghbari,et al.  Array-index: a plug&search K nearest neighbors method for high-dimensional data , 2005, Data Knowl. Eng..