Relative Comparison Kernel Learning with Auxiliary Kernels

In this work we consider the problem of learning a positive semidefinite kernel matrix from relative comparisons of the form: "object A is more similar to object B than it is to C", where comparisons are given by humans. Existing solutions to this problem assume many comparisons are provided to learn a meaningful kernel. However, this can be considered unrealistic for many real-world tasks since a large amount of human input is often costly or difficult to obtain. Because of this, only a limited number of these comparisons may be provided. We propose a new kernel learning approach that supplements the few relative comparisons with "auxiliary" kernels built from more easily extractable features in order to learn a kernel that more completely models the notion of similarity gained from human feedback. Our proposed formulation is a convex optimization problem that adds only minor overhead to methods that use no auxiliary information. Empirical results show that in the presence of few training relative comparisons, our method can learn kernels that generalize to more out-of-sample comparisons than methods that do not utilize auxiliary information, as well as similar metric learning methods.

[1]  Kaizhu Huang,et al.  Generalized sparse metric learning with relative comparisons , 2011, Knowledge and Information Systems.

[2]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[3]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[4]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[5]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[6]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[7]  M. Kendall,et al.  Rank Correlation Methods (5th ed.). , 1992 .

[8]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[9]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[10]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[11]  C. F. Kossack,et al.  Rank Correlation Methods , 1949 .

[12]  Gert R. G. Lanckriet,et al.  Heterogeneous Embedding for Subjective Artist Similarity , 2009, ISMIR.

[13]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[14]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  S. V. N. Vishwanathan,et al.  SPF-GMKL: generalized multiple kernel learning with a million kernels , 2012, KDD.

[16]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[17]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[20]  Neil C. Schwertman,et al.  Smoothing an indefinite variance-covariance matrix , 1979 .

[21]  Gert R. G. Lanckriet,et al.  Learning Multi-modal Similarity , 2010, J. Mach. Learn. Res..

[22]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[23]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[24]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[25]  Jun Wang,et al.  Metric Learning with Multiple Kernels , 2011, NIPS.

[26]  M. Kloft,et al.  On the convergence rate of l p -norm multiple kernel learning , 2012 .

[27]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[28]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[29]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[30]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[31]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[32]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[33]  Kilian Q. Weinberger,et al.  Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[34]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[35]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[36]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[37]  David J. Kriegman,et al.  Generalized Non-metric Multidimensional Scaling , 2007, AISTATS.

[38]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[39]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.

[40]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[41]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.