Task2Vec: Task Embedding for Meta-Learning

We introduce a method to generate vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function, we process images through a "probe network" and compute an embedding based on estimates of the Fisher information matrix associated with the probe network parameters. This provides a fixed-dimensional embedding of the task that is independent of details such as the number of classes and requires no understanding of the class label semantics. We demonstrate that this embedding is capable of predicting task similarities that match our intuition about semantic and taxonomic relations between different visual tasks. We demonstrate the practical value of this framework for the meta-task of selecting a pre-trained feature extractor for a novel task. We present a simple meta-learning framework for learning a metric on embeddings that is capable of predicting which feature extractors will perform well on which task. Selecting a feature extractor with task embedding yields performance close to the best available feature extractor, with substantially less computational effort than exhaustively training and evaluating all available models.

[1]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[2]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[3]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[4]  Stefano Soatto,et al.  Critical Learning Periods in Deep Neural Networks , 2017, ArXiv.

[5]  Tony R. Martinez,et al.  Recommending Learning Algorithms and Their Associated Hyperparameters , 2014, MetaSel@ECAI.

[6]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[7]  Laurens van der Maaten,et al.  Learning Discriminative Fisher Kernels , 2011, ICML.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  John Shawe-Taylor,et al.  String Kernels, Fisher Kernels and Finite State Automata , 2002, NIPS.

[10]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[11]  Stefano Soatto,et al.  The Dynamic Distance Between Learning Tasks : ∗ From Kolmogorov Complexity to Transfer Learning via Quantum Physics and the Information Bottleneck of the Weights of Deep Networks , 2019 .

[12]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[13]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[16]  Tom Heskes,et al.  On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.

[17]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Pietro Perona,et al.  Combining generative models and Fisher kernels for object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[20]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[21]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[22]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[23]  Martial Hebert,et al.  Model recommendation for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[25]  Martial Hebert,et al.  Model recommendation: Generating object detectors from few samples , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Tomaso A. Poggio,et al.  Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.

[28]  James Martens,et al.  New perspectives on the natural gradient method , 2014, ArXiv.

[29]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[30]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[31]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[32]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[33]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[34]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Joaquin Vanschoren,et al.  Speeding up algorithm selection using average ranking and active testing by introducing runtime , 2018, Machine Learning.

[36]  Ali Farhadi,et al.  Predicting Failures of Vision Systems , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.