RCMap: Efficiently Creating High-Quality Euclidean Embeddings

For many applications in computer vision and multimedia, similarity between objects is measured by a dissimilarity function that is complex, expensive to compute, and often non-metric. To allow fast distance computations, these objects may be embedded into a vector space, where the distance between the embedding of two objects approximates the actual dissimilarity between them. Traditional sparse embedding methods like FastMap and SparseMap allow embedding a new object into a vector space based on its dissimilarities with only a small set of objects. However, these methods do not optimize embedding quality, and may create embeddings that do not approximate the original dissimilarities well. BoostMap improves embedding quality, but incurs high computational cost. In this work, we propose RCMap, a technique that offers significant speedup over Boostmap, with minimal loss in embedding quality.

[1]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[2]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[3]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[4]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[5]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[6]  H. Gabriela,et al.  Cluster-preserving Embedding of Proteins , 1999 .

[7]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[8]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  S. Sclaroff,et al.  Learning Euclidean Embeddings for Indexing and Classification , 2004 .

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.