论文信息 - Score Function Features for Discriminative Learning: Matrix and Tensor Framework - 字舞流文

Score Function Features for Discriminative Learning: Matrix and Tensor Framework

Author(s): Janzamin, Majid; Sedghi, Hanie; Anandkumar, Anima | Abstract: Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning.

Anima Anandkumar | Majid Janzamin | Hanie Sedghi | Anima Anandkumar | Hanie Sedghi | Majid Janzamin

[1] Sanjeev Arora,et al. New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[2] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[3] Rajat Raina,et al. Constructing informative priors using transfer learning , 2006, ICML.

[4] Nando de Freitas,et al. On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[5] Anima Anandkumar,et al. Learning Overcomplete Latent Variable Models through Tensor Methods , 2014, COLT.

[6] Anima Anandkumar,et al. Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.

[7] John Blitzer,et al. Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[8] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[9] Prateek Jain,et al. Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.

[10] G. Reinert,et al. Distributional Transformations, Orthogonal Polynomials, and Stein Characterizations , 2005, math/0510240.

[11] Le Song,et al. Nonparametric Estimation of Multi-View Latent Variable Models , 2013, ICML.

[12] Christophe Ley,et al. Parametric Stein operators and variance bounds , 2013, 1305.5067.

[13] C. Stein. Approximate computation of expectations , 1986 .

[14] Quoc V. Le,et al. ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[15] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.

[16] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[17] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.

[18] Trevor Darrell,et al. Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Feiping Nie,et al. Robust and Discriminative Self-Taught Learning , 2013, ICML.

[20] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[21] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[22] P. Diaconis,et al. Use of exchangeable pairs in the analysis of simulations , 2004 .

[23] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[24] Björn Holmquist,et al. The d-variate vector Hermite polynomial of order k , 1996 .

[25] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.

[26] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[27] H. Grad. Note on N‐dimensional hermite polynomials , 1949 .

[28] Christopher Joseph Pal,et al. Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[29] Aditya Bhaskara,et al. Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[30] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[31] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[32] Paul Mineiro,et al. Discriminative Features via Generalized Eigenvectors , 2013, ICML.

[33] Yishay Mansour,et al. Domain Adaptation with Multiple Sources , 2008, NIPS.

[34] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[35] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[36] Anima Anandkumar,et al. Provable Learning of Overcomplete Latent Variable Models: Semi-supervised and Unsupervised Settings , 2014, ArXiv.

[37] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[39] Aapo Hyvärinen,et al. Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density , 2014, ECML/PKDD.

[40] J. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[41] Gene H. Golub,et al. Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[42] Yiming Yang,et al. Flexible latent variable models for multi-task learning , 2008, Machine Learning.

[43] Anima Anandkumar,et al. Provable Tensor Methods for Learning Mixtures of Classifiers , 2014, ArXiv.

[44] Rob Fergus,et al. Visualizing and Understanding Convolutional Neural Networks , 2013 .

[45] Siwei Lyu,et al. Interpretation and Generalization of Score Matching , 2009, UAI.

[46] Yoshua Bengio,et al. What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[47] Tommi S. Jaakkola,et al. Partially labeled classification with Markov random walks , 2001, NIPS.

[48] Rong Yan,et al. Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[49] Laurens van der Maaten,et al. Learning Discriminative Fisher Kernels , 2011, ICML.

[50] Kiyoshi Asai,et al. Marginalized kernels for biological sequences , 2002, ISMB.

[51] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[52] Trevor Darrell,et al. Efficient Learning of Domain-invariant Image Representations , 2013, ICLR.

[53] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[54] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[55] Anima Anandkumar,et al. Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[56] Kristen Grauman,et al. Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[57] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[58] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.