The Metric Nearness Problem with Applications

Many practical applications in machine learning require pairwise distances among a set of objects. It is often desirable that these distance measurements satisfy the properties of a metric, especially the triangle inequality. Applications that could benefit from the metric property include data clustering and metric-based indexing of databases. In this paper, we present the metric nearness problem: Given a dissimilarity matrix, find the “nearest” matrix of distances that satisfy the triangle inequalities. A weight matrix in the formulation captures the confidence in individual dissimilarity measures, including the case of altogether missing distances. For an important class of nearness measures, the problem can be attacked with convex optimization techniques. A pleasing aspect of this formulation is that we can compute globally optimal solutions. Experiments on some sample dissimilarity matrices are presented, including some from biology.

[1]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[2]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[3]  Joachim M. Buhmann,et al.  Going Metric: Denoising Pairwise Data , 2002, NIPS.

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[6]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[7]  C. Greg Plaxton,et al.  The online median problem , 1999, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[8]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[9]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[10]  Ronald Fagin,et al.  Relaxing the Triangle Inequality in Pattern Matching , 2004, International Journal of Computer Vision.

[11]  R. Mathar,et al.  Algorithms in Convex Analysis to Fit lp-Distance Matrices , 1994 .

[12]  Carsten Lund,et al.  Proof verification and the hardness of approximation problems , 1998, JACM.

[13]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[14]  N. Higham MATRIX NEARNESS PROBLEMS AND APPLICATIONS , 1989 .

[15]  I. J. Schoenberg Remarks to Maurice Frechet's Article ``Sur La Definition Axiomatique D'Une Classe D'Espace Distances Vectoriellement Applicable Sur L'Espace De Hilbert , 1935 .

[16]  Claire Mathieu,et al.  A Randomized Approximation Scheme for Metric MAX-CUT , 1998, FOCS.

[17]  Piotr Indyk,et al.  Sublinear time algorithms for metric space problems , 1999, STOC '99.

[18]  R. Steele,et al.  Optimization , 2005, Encyclopedia of Biometrics.