Learning a Mahalanobis Metric with Side Information

Many learning algorithms use a metric defined over the input space as a principal tool, and their performance critically depends on the quality of this metric. We address the problem of learning metrics using side-information in the form of equivalence constraints. Unlike labels, we demonstrate that this type of side-information can sometimes be automatically obtained without the need of human intervention. We show how such side-information can be used to modify the representation of the data, leading to improved clustering and classification. Specifically, we present the Relevant Component Analysis (RCA) algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric. We show that RCA is the solution of an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher’s linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. Moreover, under certain Gaussian assumptions, RCA can be viewed as an ML estimation of the inner class covariance matrix. We conclude with extensive empirical evaluations of RCA, showing its advantage over alternative methods.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[3]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[4]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[5]  Gal Chechik,et al.  Extracting Relevant Structures with Side Information , 2002, NIPS.

[6]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[7]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification and Regression , 1995, NIPS.

[8]  Shimon Ullman,et al.  Face Recognition: The Problem of Compensating for Changes in Illumination Direction , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[10]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[11]  Ralph Linsker,et al.  An Application of the Principle of Maximum Information Preservation to Linear Systems , 1988, NIPS.

[12]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[13]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[14]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[15]  Misha Pavel,et al.  Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[16]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[17]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[18]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.