Score Function Features for Discriminative Learning

Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning.

[1]  Laurens van der Maaten,et al.  Learning Discriminative Fisher Kernels , 2011, ICML.

[2]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[3]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[4]  Feiping Nie,et al.  Robust and Discriminative Self-Taught Learning , 2013, ICML.

[5]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[6]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[7]  Anima Anandkumar,et al.  Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.

[8]  Yiming Yang,et al.  Flexible latent variable models for multi-task learning , 2008, Machine Learning.

[9]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Classifiers , 2014, ArXiv.

[10]  Anima Anandkumar,et al.  Learning Overcomplete Latent Variable Models through Tensor Methods , 2014, COLT.

[11]  Anima Anandkumar,et al.  Online tensor methods for learning latent variable models , 2013, J. Mach. Learn. Res..

[12]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[15]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[16]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[17]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[18]  Paul Mineiro,et al.  Discriminative Features via Generalized Eigenvectors , 2013, ICML.