New anchor selection methods for image retrieval

Anchoring is a technique for representing objects by their distances to a few well chosen landmarks, or anchors. Objects are mapped to distance-based feature vectors, which can be used for content-based retrieval, classification, clustering, and relevance feedback of images, audio, and video. The anchoring transformation typically reduces dimensionality and replaces expensive similarity computations in the original domain with simple distance computations in the anchored feature domain, while guaranteeing lack of false dismissals. Anchoring is therefore surprisingly simple, yet effective, and flavors of it have seen application in speech recognition, audio classification, protein homology detection, and shape matching. In this paper, we describe the anchoring technique in some detail and study methods for anchor selection, both from an analytical, as well as empirical, standpoint. Most work to date has largely ignored this problem by fixing the anchors to be the entire set of objects or by using greedy selection from among the set of objects. We generalize previous work by considering anchors from outside of the object space, and by deriving an analytical upper bound on the distance-approximation error of the method.

[1]  Li Liao,et al.  Combining pairwise sequence similarity and support vector machines for remote protein homology detection , 2002, RECOMB '02.

[2]  Malcolm Slaney,et al.  Mixtures of probability experts for audio retrieval and indexing , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[3]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[4]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[5]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[6]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[7]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[8]  Remco C. Veltkamp,et al.  Efficient image retrieval through vantage objects , 1999, Pattern Recognit..

[9]  Douglas E. Sturim,et al.  Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[11]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[12]  Christos Faloutsos,et al.  Similarity search without tears: the OMNI-family of all-purpose access methods , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[14]  John R. Smith,et al.  A study of image retrieval by anchoring , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[15]  Alexander Thomasian,et al.  Clustering and singular value decomposition for approximate indexing in high dimensional spaces , 1998, CIKM '98.

[16]  E. Ruiz An algorithm for finding nearest neighbours in (approximately) constant average time , 1986 .

[17]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[18]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.