BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval

This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for any three objects X, A, B whether X is closer to A or to B. It is shown that a linear combination of such embedding-based classifiers naturally corresponds to an embedding and a distance measure. Based on this property, the BoostMap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier. The classification accuracy of the resulting strong classifier is a direct measure of the amount of nearest neighbor structure preserved by the embedding. An important property of BoostMap is that the embedding optimization criterion is equally valid in both metric and nonmetric spaces. Performance is evaluated in databases of hand images, handwritten digits, and time series. In all cases, BoostMap significantly improves retrieval efficiency with small losses in accuracy compared to brute-force search. Moreover, BoostMap significantly outperforms existing nearest neighbor retrieval methods such as Lipschitz embeddings, FastMap, and VP-trees.

[1]  S. Sclaroff,et al.  Learning embeddings for indexing, retrieval, and classification, with applications to object and shape recognition in image databases , 2006 .

[2]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[3]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[4]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[5]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Ömer Egecioglu,et al.  Dimensionality reduction and similarity computation by inner-product approximations , 2000, IEEE Transactions on Knowledge and Data Engineering.

[7]  Edward Y. Chang,et al.  Clustering for Approximate Similarity Search in High-Dimensional Spaces , 2002, IEEE Trans. Knowl. Data Eng..

[8]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[9]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[10]  George Kollios,et al.  Query-sensitive embeddings , 2005, TODS.

[11]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[13]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[14]  Klemens Böhm,et al.  Trading Quality for Time with Nearest Neighbor Search , 2000, EDBT.

[15]  Alexander G. Gray,et al.  Efficient exact k-NN and nonparametric classification in high dimensions , 2003, NIPS 2003.

[16]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[17]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[18]  H. Gabriela,et al.  Cluster-preserving Embedding of Proteins , 1999 .

[19]  Stan Sclaroff,et al.  Boosting nearest neighbor classifiers for multiclass recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[20]  Forrest W. Young Multidimensional Scaling: History, Theory, and Applications , 1987 .

[21]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[22]  Anthony K. H. Tung,et al.  LDC: enabling search by partial distance in a hyper-dimensional space , 2004, Proceedings. 20th International Conference on Data Engineering.

[23]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[24]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[25]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[28]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[29]  Z. Meral Özsoyoglu,et al.  Distance based indexing for string proximity search , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[30]  M. Narasimha Murty,et al.  An incremental prototype set building technique , 2002, Pattern Recognit..

[31]  Enrique Vidal,et al.  New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA) , 1994, Pattern Recognit. Lett..

[32]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[33]  Kaizhong Zhang,et al.  An Index Structure for Data Mining and Clustering , 2000, Knowledge and Information Systems.

[34]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[35]  Jitendra Malik,et al.  Learning a discriminative classifier using shape context distances , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[36]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[37]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[38]  Stan Sclaroff,et al.  Efficient nearest neighbor classification using a cascade of approximate similarity measures , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Pavel Zezula,et al.  Approximate similarity retrieval with M-trees , 1998, The VLDB Journal.

[40]  Stan Z. Li,et al.  FloatBoost learning and statistical face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[42]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[43]  Claus Bahlmann,et al.  The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[45]  Hakan Ferhatosmanoglu,et al.  Dynamic Dimensionality Reduction and Similarity Distance Computation by Inner Product Approximations , 1999, CIKM 1999.

[46]  George Kollios,et al.  BoostMap: A method for efficient approximate similarity rankings , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[47]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Hanan Samet,et al.  Properties of Embedding Methods for Similarity Searching in Metric Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[50]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[51]  Trevor Darrell,et al.  Fast contour matching using approximate earth mover's distance , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[52]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[53]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[54]  Andrew W. Moore,et al.  The IOC algorithm: efficient many-class non-parametric classification for high-dimensional data , 2004, KDD.

[55]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[56]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[57]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[58]  Ramesh C. Jain,et al.  Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[59]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[60]  Stan Sclaroff,et al.  Automatic 2D Hand Tracking in Video Sequences , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[61]  Stan Sclaroff,et al.  Online and offline character recognition using alignment to prototypes , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[62]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[63]  Kenneth Rose,et al.  VQ-index: an index structure for similarity searching in multimedia databases , 2002, MULTIMEDIA '02.

[64]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[65]  Dimitrios Gunopulos,et al.  Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[66]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.