Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation

Distance-based methods in machine learning and pattern recognition have to rely on a metric distance between points in the input space. Instead of specifying a metric a priori, we seek to learn the metric from data via kernel methods and multidimensional scaling (MDS) techniques. Under the classification setting, we define discriminant kernels on the joint space of input and output spaces and present a specific family of discriminant kernels. This family of discriminant kernels is attractive because the induced metrics are Euclidean and Fisher separable, and MDS techniques can be used to find the low-dimensional Euclidean representations (also called feature vectors) of the induced metrics. Since the feature vectors incorporate information from both input points and their corresponding labels and they enjoy Fisher separability, they are appropriate to be used in distance-based classifiers.

[1]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[2]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[3]  Andrew R. Webb,et al.  Multidimensional scaling by iterative majorization using radial basis functions , 1995, Pattern Recognit..

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[6]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[7]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[8]  Keinosuke Fukunaga,et al.  A Nonlinear Feature Extraction Algorithm Using Distance Transformation , 1972, IEEE Transactions on Computers.

[9]  Michael I. Jordan,et al.  Learning Graphical Models with Mercer Kernels , 2002, NIPS.

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  G. Wahba Spline models for observational data , 1990 .

[13]  Trevor F. Cox,et al.  Discriminant analysis using non-metric multidimensional scaling , 1993, Pattern Recognit..

[14]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[15]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16]  Nello Cristianini,et al.  On the Extensions of Kernel Alignment , 2002 .

[17]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[18]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.