A Kernel Approach for Semi-Supervised Metric Learning

While distance function learning for supervised learning tasks has a long history, extending it to learning tasks with weaker supervisory information has only been studied recently. In particular, some methods have been proposed for semi-supervised metric learning based on pairwise similarity or dissimilarity information. In this paper, we propose a kernel approach for semi-supervised metric learning and present in detail two special cases of this kernel approach. The metric learning problem is thus formulated as an optimization problem for kernel learning. An attractive property of the optimization problem is that it is convex and hence has no local optima. While a closed-form solution exists for the first special case, the second case is solved using an iterative majorization procedure to estimate the optimal solution asymptotically. Experimental results based on both synthetic and real-world data show that this new kernel approach is promising for nonlinear metric learning.

[1]  Hong Chang,et al.  Kernel-based distance metric learning for content-based image retrieval , 2007, Image Vis. Comput..

[2]  Hong Chang,et al.  Relaxational metric adaptation and its application to semi-supervised clustering and content-based image retrieval , 2006, Pattern Recognit..

[3]  Dit-Yan Yeung,et al.  Locally linear metric adaptation with application to semi-supervised clustering and image retrieval , 2006, Pattern Recognit..

[4]  Hong Chang,et al.  Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints , 2006, Pattern Recognit..

[5]  Zhihua Zhang,et al.  Model-based transductive learning of the kernel matrix , 2006, Machine Learning.

[6]  Ivor W. Tsang,et al.  Efficient hyperkernel learning using second-order cone programming , 2004, IEEE Transactions on Neural Networks.

[7]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[8]  Zhengdong Lu,et al.  Semi-supervised Learning with Penalized Probabilistic Clustering , 2004, NIPS.

[9]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[10]  Michael Fink,et al.  Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[11]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[12]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[13]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[14]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[15]  Hong Chang,et al.  Locally linear metric adaptation for semi-supervised clustering , 2004, ICML.

[16]  Tomer Hertz,et al.  Boosting margin based distance functions for clustering , 2004, ICML.

[17]  Zhihua Zhang,et al.  Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm , 2004, ICML.

[18]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[19]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[20]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[21]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[22]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[23]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[24]  Kiyoshi Asai,et al.  The em Algorithm for Kernel Matrix Completion with Auxiliary Data , 2003, J. Mach. Learn. Res..

[25]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[26]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[27]  Zhihua Zhang,et al.  Parametric Distance Metric Learning with Label Information , 2003, IJCAI.

[28]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Jing Peng,et al.  Adaptive kernel metric nearest neighbor classification , 2002, Object recognition supported by user interaction for service robots.

[30]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[31]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[32]  Samuel Kaski,et al.  Clustering Based on Conditional Distributions in an Auxiliary Space , 2002, Neural Computation.

[33]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[34]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[35]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[36]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[37]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[38]  Dimitrios Gunopulos,et al.  Adaptive Nearest Neighbor Classification Using Support Vector Machines , 2001, NIPS.

[39]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[40]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[41]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[42]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[43]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[45]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[46]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[48]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[49]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .