The Metric Nearness Problem

Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

[1]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[2]  Joachim M. Buhmann,et al.  Going Metric: Denoising Pairwise Data , 2002, NIPS.

[3]  Piotr Indyk,et al.  Sublinear time algorithms for metric space problems , 1999, STOC '99.

[4]  C. Greg Plaxton,et al.  The Online Median Problem , 1999, SIAM J. Comput..

[5]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[6]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[7]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[8]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[9]  Paul Tseng,et al.  Dual coordinate ascent methods for non-strictly convex minimization , 1993, Math. Program..

[10]  Claire Mathieu,et al.  A Randomized Approximation Scheme for Metric MAX-CUT , 2001, J. Comput. Syst. Sci..

[11]  N. Higham MATRIX NEARNESS PROBLEMS AND APPLICATIONS , 1989 .

[12]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[13]  R. Steele Optimization , 2005 .

[14]  Inderjit S. Dhillon,et al.  Triangle Fixing Algorithms for the Metric Nearness Problem , 2004, NIPS.

[15]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[16]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[17]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[18]  J. Gower Properties of Euclidean and non-Euclidean distance matrices , 1985 .

[19]  I. J. Schoenberg Remarks to Maurice Frechet's Article ``Sur La Definition Axiomatique D'Une Classe D'Espace Distances Vectoriellement Applicable Sur L'Espace De Hilbert , 1935 .

[20]  Joachim M. Buhmann,et al.  Optimal Cluster Preserving Embedding of Nonmetric Proximity Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[22]  Dana Ron,et al.  Testing metric properties , 2003, Inf. Comput..

[23]  O. Mangasarian Normal solutions of linear programs , 1984 .

[24]  Michael S Lewis-Beck,et al.  Sage university papers. Series Quantitative applications in the social sciences , 2008 .

[25]  Phipps Arabie,et al.  The Representation of Symmetric Proximity Data: Dimensions and Classifications , 1998, Comput. J..

[26]  Klaus-Robert Müller,et al.  Feature Discovery in Non-Metric Pairwise Data , 2004, J. Mach. Learn. Res..