Convex Sparse Coding, Subspace Learning, and Semi-Supervised Extensions

Automated feature discovery is a fundamental problem in machine learning. Although classical feature discovery methods do not guarantee optimal solutions in general, it has been recently noted that certain subspace learning and sparse coding problems can be solved efficiently, provided the number of features is not restricted a priori. We provide an extended characterization of this optimality result and describe the nature of the solutions under an expanded set of practical contexts. In particular, we apply the framework to a semi-supervised learning problem, and demonstrate that feature discovery can co-occur with input reconstruction and supervised training while still admitting globally optimal solutions. A comparison to existing semi-supervised feature discovery methods shows improved generalization and efficiency.

[1]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[4]  D. Petz A survey of certain trace inequalities , 1994 .

[5]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[6]  Gábor Pataki,et al.  On the Rank of Extreme Matrices in Semidefinite Programs and the Multiplicity of Optimal Eigenvalues , 1998, Math. Oper. Res..

[7]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[8]  Geoffrey J. Gordon Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[9]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[11]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  Geoffrey J. Gordon,et al.  The support vector decomposition machine , 2006, ICML.

[14]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[15]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[16]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[17]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[18]  Daureen Steinberg COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO ROBUST OPTIMIZATION , 2007 .

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Geoffrey J. Gordon,et al.  Closed-form supervised dimensionality reduction with generalized linear models , 2008, ICML '08.

[21]  Sebastian Nowozin,et al.  A decoupled approach to exemplar-based unsupervised learning , 2008, ICML '08.

[22]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[23]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[24]  David M. Bradley,et al.  Differentiable Sparse Coding , 2008, NIPS.

[25]  Feng Liu,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries in Wavelet Domain , 2009, 2009 Fifth International Conference on Image and Graphics.

[26]  Alexander Olshevsky,et al.  Matrix P-norms are NP-hard to approximate if p ≠1,2,∞ , 2009 .

[27]  A. Ng,et al.  Exponential Family Sparse Coding with Application to Self-taught Learning , 2009, IJCAI.

[28]  David M. Bradley,et al.  Convex Coding , 2009, UAI.

[29]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[30]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[31]  Trevor Darrell,et al.  Factorized Latent Spaces with Structured Sparsity , 2010, NIPS.

[32]  Julien M. Hendrickx,et al.  Matrix p-Norms Are NP-Hard to Approximate If p!=q1, 2, INFINITY , 2010, SIAM J. Matrix Anal. Appl..

[33]  Robert D. Nowak,et al.  Transduction with Matrix Completion: Three Birds with One Stone , 2010, NIPS.

[34]  Proximal Methods for Sparse Hierarchical Dictionary Learning: Supplementary Materials , 2010 .

[35]  Ruslan Salakhutdinov,et al.  Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm , 2010, NIPS.

[36]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[37]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[38]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..