On Spectral Learning

In this paper, we study the problem of learning a matrix W from a set of linear measurements. Our formulation consists in solving an optimization problem which involves regularization with a spectral penalty term. That is, the penalty term is a function of the spectrum of the covariance of W. Instances of this problem in machine learning include multi-task learning, collaborative filtering and multi-view learning, among others. Our goal is to elucidate the form of the optimal solution of spectral learning. The theory of spectral learning relies on the von Neumann characterization of orthogonally invariant norms and their association with symmetric gauge functions. Using this tool we formulate a representer theorem for spectral regularization and specify it to several useful example, such as Schatten p-norms, trace norm and spectral norm, which should proved useful in applications.

[1]  Claudio Gentile,et al.  Linear Algorithms for Online Multitask Classification , 2010, COLT.

[2]  Massimiliano Pontil,et al.  An Algorithm for Transfer Learning in a Heterogeneous Environment , 2008, ECML/PKDD.

[3]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[4]  C. Pinkus Variational problems arising from balancing several error criteria , 1994 .

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[7]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[8]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[9]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[10]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[11]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[12]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[13]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[14]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[15]  Andreas Maurer,et al.  Learning Similarity with Operator-valued Large-margin Classifiers , 2008, J. Mach. Learn. Res..

[16]  Andreas Maurer,et al.  Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[17]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[18]  R. Bhatia Matrix Analysis , 1996 .

[19]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[20]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[21]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[22]  Charles A. Micchelli,et al.  When is there a representer theorem? Vector versus matrix regularizers , 2008, J. Mach. Learn. Res..

[23]  A. Lewis The Convex Analysis of Unitarily Invariant Matrix Functions , 1995 .

[24]  G. Wahba Spline models for observational data , 1990 .

[25]  T. Başar Feedback and Optimal Sensitivity: Model Reference Transformations, Multiplicative Seminorms, and Approximate Inverses , 2001 .

[26]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..