Learning sparse metrics via linear programming

Calculation of object similarity, for example through a distance function, is a common part of data mining and machine learning algorithms. This calculation is crucial for efficiency since distances are usually evaluated a large number of times, the classical example being query-by-example (find objects that are similar to a given query object). Moreover, the performance of these algorithms depends critically on choosing a good distance function. However, it is often the case that (1) the correct distance is unknown or chosen by hand, and (2) its calculation is computationally expensive (e.g., such as for large dimensional objects). In this paper, we propose a method for constructing relative-distance preserving low-dimensional mapping (sparse mappings). This method allows learning unknown distance functions (or approximating known functions) with the additional property of reducing distance computation time. We present an algorithm that given examples of proximity comparisons among triples of objects (object i is more like object j than object k), learns a distance function, in as few dimensions as possible, that preserves these distance relationships. The formulation is based on solving a linear programming optimization problem that finds an optimal mapping for the given dataset and distance relationships. Unlike other popular embedding algorithms, this method can easily generalize to new points, does not have local minima, and explicitly models computational efficiency by finding a mapping that is sparse, i.e. one that depends on a small subset of features or dimensions. Experimental evaluation shows that the proposed formulation compares favorably with a state-of-the art method in several publicly available datasets.

[1]  Gene H. Golub,et al.  Matrix computations , 1983 .

[2]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[3]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[4]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[5]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[6]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[7]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[8]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[11]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[12]  Alexander J. Smola,et al.  Minimal Kernel Classifiers , 2002, J. Mach. Learn. Res..

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[15]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  George Kollios,et al.  BoostMap: A method for efficient approximate similarity rankings , 2004, CVPR 2004.

[17]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.