Guaranteed Classification via Regularized Similarity Learning

Learning an appropriate (dis)similarity function from the available data is a central problem in machine learning, since the success of many machine learning algorithms critically depends on the choice of a similarity function to compare examples. Despite many approaches to similarity metric learning that have been proposed, there has been little theoretical study on the links between similarity metric learning and the classification performance of the resulting classifier. In this letter, we propose a regularized similarity learning formulation associated with general matrix norms and establish their generalization bounds. We show that the generalization error of the resulting linear classifier can be bounded by the derived generalization bound of similarity learning. This shows that a good generalization of the learned similarity function guarantees a good classification of the resulting linear classifier. Our results extend and improve those obtained by Bellet, Habrard, and Sebban (2012). Due to the techniques dependent on the notion of uniform stability (Bousquet & Elisseeff, 2002), the bound obtained there holds true only for the Frobenius matrix-norm regularization. Our techniques using the Rademacher complexity (Bartlett & Mendelson, 2002) and its related Khinchin-type inequality enable us to establish bounds for regularized similarity learning formulations associated with general matrix norms, including sparse L1-norm and mixed (2,1)-norm.

[1]  Colin Campbell,et al.  Analysis of SVM with Indefinite Kernels , 2009, NIPS.

[2]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[3]  B. Bollobás Surveys in Combinatorics , 1979 .

[4]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[7]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Prateek Jain,et al.  Similarity-based Learning via Data Driven Embeddings , 2011, NIPS.

[9]  C. Campbell,et al.  Generalization bounds for learning the kernel , 2009 .

[10]  Qiong Cao,et al.  Generalization bounds for metric and similarity learning , 2012, Machine Learning.

[11]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[12]  Prateek Jain,et al.  Supervised Learning with Similarity Functions , 2012, NIPS.

[13]  E. Giné,et al.  Decoupling: From Dependence to Independence , 1998 .

[14]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[15]  Andreas Maurer,et al.  Learning Similarity with Operator-valued Large-margin Classifiers , 2008, J. Mach. Learn. Res..

[16]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[17]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[18]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[19]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[20]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[21]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[22]  Koray Kavukcuoglu,et al.  A Binary Classification Framework for Two-Stage Multiple Kernel Learning , 2012, ICML.

[23]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[24]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[25]  Glenn Fung,et al.  Learning sparse metrics via linear programming , 2006, KDD '06.

[26]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[27]  Maria-Florina Balcan,et al.  Improved Guarantees for Learning via Similarity Functions , 2008, COLT.

[28]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.

[29]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[30]  Alexander J. Smola,et al.  Regularization with Dot-Product Kernels , 2000, NIPS.

[31]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[32]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[33]  Purushottam Kar Generalization Guarantees for a Binary Classification Framework for Two-Stage Multiple Kernel Learning , 2013, ArXiv.

[34]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[35]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[36]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[37]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[38]  Qiang Wu,et al.  Regularization networks with indefinite kernels , 2013, J. Approx. Theory.

[39]  Marc Sebban,et al.  Similarity Learning for Provably Accurate Sparse Linear Classification , 2012, ICML.

[40]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[42]  Colin Campbell,et al.  Generalization Bounds for Learning the Kernel Problem , 2009, COLT.

[43]  Ding-Xuan Zhou,et al.  SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming , 2005, Neural Computation.

[44]  Liwei Wang,et al.  On learning with dissimilarity functions , 2007, ICML '07.