Chameleon: Learning Model Initializations Across Tasks With Different Schemas

Parametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization. In most current practices, this is accomplished by using some form of random initialization. Instead, recent work shows that a specific initial parameter set can be learned from a population of tasks, i.e., dataset and target variable for supervised learning tasks. Using this initial parameter set leads to faster convergence for new tasks (model-agnostic meta-learning). Currently, methods for learning model initializations are limited to a population of tasks sharing the same schema, i.e., the same number, order, type and semantics of predictor and target variables. In this paper, we address the problem of meta-learning parameter initialization across tasks with different schemas, i.e., if the number of predictors varies across tasks, while they still share some variables. We propose Chameleon, a model that learns to align different predictor schemas to a common representation. We use permutations and masks of the predictors of the training tasks at hand. In experiments on real-life data sets, we show that Chameleon successfully can learn parameter initializations across tasks with different schemas providing a 26\% lift on accuracy on average over random initialization and of 5\% over a state-of-the-art method for fixed-schema learning model initializations. To the best of our knowledge, our paper is the first work on the problem of learning model initialization across tasks with different schemas.

[1]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[2]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[6]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[7]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[10]  Jun Yu,et al.  Local Deep-Feature Alignment for Unsupervised Dimension Reduction , 2018, IEEE Transactions on Image Processing.

[11]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[12]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[13]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[14]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[15]  Lars Schmidt-Thieme,et al.  Scalable Hyperparameter Optimization with Products of Gaussian Process Experts , 2016, ECML/PKDD.

[16]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[17]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[18]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Lars Schmidt-Thieme,et al.  Dataset2Vec: learning dataset meta-features , 2019, Data Mining and Knowledge Discovery.

[20]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[23]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[24]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[30]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[32]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.