Supervised Matrix Factorization with sparseness constraints and fast inference

Non-negative Matrix Factorization is a technique for decomposing large data sets into bases and code words, where all entries of the occurring matrices are non-negative. A recently proposed technique also incorporates sparseness constraints, in such a way that the amount of nonzero entries in both bases and code words becomes controllable. This paper extends the Non-negative Matrix Factorization with Sparseness Constraints. First, a modification of the optimization criteria ensures fast inference of the code words. Thus, the approach is real-time capable for use in time critical applications. Second, in case a teacher signal is associated with the samples, it is considered in order to ensure that inferred code words of different classes can be well distinguished. Thus, the derived bases generate discriminative code words, which is a crucial prerequisite for training powerful classifiers. Experiments on natural image patches show, similar to recent results in the field of sparse coding algorithms, that Gabor-like filters are minimizing the reconstruction error while retaining inference capabilities. However, applying the approach with incorporation of the teacher signal to handwritten digits yields morphologically completely different bases, while achieving superior classification results.

[1]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[2]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[3]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[4]  Christoph Schnörr,et al.  Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming , 2006, J. Mach. Learn. Res..

[5]  E. Callaway,et al.  Excitatory cortical neurons form fine-scale functional networks , 2005, Nature.

[6]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[7]  Sen Song,et al.  Highly Nonrandom Features of Synaptic Connectivity in Local Cortical Circuits , 2005, PLoS biology.

[8]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[9]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Toshihisa Tanaka,et al.  Sparseness by Iterative Projections Onto Spheres , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[13]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[14]  Toshihisa Tanaka,et al.  First results on uniqueness of sparse non-negative matrix factorization , 2005, 2005 13th European Signal Processing Conference.

[15]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[16]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[21]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[22]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[23]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[25]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[26]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.