Optimizing Massive Distance Computations in Pattern Recognition

It is a common task in pattern recognition to evaluate the similarity of large data objects. These are often represented by high dimensional vectors. A frequently used mathematical model for evaluating their similarity is to view them as points (vectors) in a high dimensional space, and compute their distances from each other. The "distance," however, can be defined in a very complicated way, it may be much more complex than the well known Euclidean distance. Therefore, the algorithmic bottleneck often becomes the number of distance computations that need to be carried out. We consider the case when we have to compute all the distances between n objects, where n is large. Without any shortcuts it takes n ( n − 1)/2 = O ( n 2 ) distance computations. In those applications where the distances are complicated, being defined by sophisticated algorithms (such as in speech and image recognition), a quadratically growing number of distance computations becomes a severe bottleneck. We prove the following general result that can help eliminating the bottleneck: for a large and general class of distances it is possible to obtain a very close approximation of each of the O(n 2 ) pairwise distances of n objects by doing only a linear number distance computations, which is optimal with respect to the order of magnitude. Moreover, the approximation factor can be made arbitrarily close to 1, making the approximation error negligible. The needed side computations to achieve this reduction can also be done in polynomial time.

[1]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[2]  Ittai Abraham,et al.  Embedding metric spaces in their intrinsic dimension , 2008, SODA '08.

[3]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[4]  Jirí Matousek,et al.  Low-Distortion Embeddings of Finite Metric Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[5]  Ittai Abraham,et al.  Advances in metric embedding theory , 2006, STOC '06.

[6]  Patrice Assouad Plongements lipschitziens dans ${\mathbb {R}}^n$ , 1983 .

[7]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[8]  Kenneth Ward Church,et al.  Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections , 2006, J. Mach. Learn. Res..

[9]  J. Matousek,et al.  On the distortion required for embedding finite metric spaces into normed spaces , 1996 .

[10]  P. Assouad Plongements lipschitziens dans Rn , 2003 .

[11]  Xilin Yi,et al.  Line feature-based recognition using Hausdorff distance , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[12]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[13]  William Rucklidge,et al.  Efficiently Locating Objects Using the Hausdorff Distance , 1997, International Journal of Computer Vision.

[14]  Stephen Semmes,et al.  On the nonexistence of bilipschitz parameterizations and geometric problems about $A_\infty$-weights , 1996 .

[15]  Lee-Ad Gottlieb,et al.  A Nonlinear Approach to Dimension Reduction , 2009, SODA '11.

[16]  J. Matou Bi-Lipschitz embeddings into low-dimensional Euclidean spaces , 1990 .

[17]  Piotr Indyk,et al.  Algorithmic applications of low-distortion geometric embeddings , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[18]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[19]  Noga Alon,et al.  Problems and results in extremal combinatorics--I , 2003, Discret. Math..