Gradients as Features for Deep Representation Learning

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the model parameters with respect to a task-specific loss given an input sample. Our key innovation is the design of a linear model that incorporates both gradient features and the activation of the network. We show that our model provides a local linear approximation to a underlying deep model, and discuss important theoretical insight. Moreover, we present an efficient algorithm for the training and inference of our model without computing the actual gradients. Our method is evaluated across a number of representation learning tasks on several datasets and using different network architectures. We demonstrate strong results in all settings. And our results are well-aligned with our theoretical insight.

[1]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yuan Cao,et al.  A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.

[5]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[8]  David Berthelot,et al.  Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer , 2018, ICLR.

[9]  Abhinav Gupta,et al.  Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Ruosong Wang,et al.  Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[11]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[12]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[14]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[15]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[16]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Quanquan Gu,et al.  Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.

[18]  Alexander Kolesnikov,et al.  Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[22]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[23]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[24]  François Fleuret,et al.  Knowledge Transfer with Jacobian Matching , 2018, ICML.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[29]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[30]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[33]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[34]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Trevor Darrell,et al.  Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[36]  Andrea Montanari,et al.  Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.

[37]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[38]  Zhao Chen,et al.  Gradient Adversarial Training of Neural Networks , 2018, ArXiv.

[39]  Florent Perronnin,et al.  Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[41]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[42]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[43]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[44]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[45]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[46]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[47]  Dale Schuurmans,et al.  Holographic Feature Representations of Deep Networks , 2017, UAI.

[48]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[49]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.