论文信息 - Reconciling meta-learning and continual learning with online mixtures of tasks

Reconciling meta-learning and continual learning with online mixtures of tasks

Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably dissimilar or change over time. We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hierarchical Bayesian models over the parameters of an arbitrary parametric model such as a neural network. In contrast to consolidating inductive biases into a single set of hyperparameters, our approach of task-dependent hyperparameter selection better handles latent distribution shift, as demonstrated on a set of evolving, image-based, few-shot learning benchmarks.

[1] Michael I. Jordan,et al. Variational inference for Dirichlet process mixtures , 2006 .

[2] Bernhard Schölkopf,et al. Discriminative k-shot learning using probabilistic models , 2017, ArXiv.

[3] Stephen M. Smith,et al. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm , 2001, IEEE Transactions on Medical Imaging.

[4] D. Sculley,et al. Web-scale k-means clustering , 2010, WWW '10.

[5] David Barber,et al. Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[6] Anton Schwaighofer,et al. Learning Gaussian processes from multiple tasks , 2005, ICML.

[7] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[8] M. Escobar,et al. Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[9] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[10] Tsendsuren Munkhdalai,et al. Metalearning with Hebbian Fast Weights , 2018, ArXiv.

[11] Zoubin Ghahramani,et al. Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[13] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[15] Alexis Boukouvalas,et al. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm , 2016, PloS one.

[16] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[17] Jürgen Schmidhuber,et al. Neural Expectation Maximization , 2017, NIPS.

[18] Neil D. Lawrence,et al. Learning to learn with the informative vector machine , 2004, ICML.

[19] Adam J Rothman,et al. Sparse Multivariate Regression With Covariance Estimation , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[20] Anne G E Collins,et al. Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[21] H. Robbins. A Stochastic Approximation Method , 1951 .

[22] Jonathan Baxter,et al. A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[23] Samuel J. Gershman,et al. A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[24] Svetha Venkatesh,et al. Factorial Multi-Task Learning : A Bayesian Nonparametric Approach , 2013, ICML.

[25] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[26] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[27] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28] Thomas G. Dietterich,et al. To transfer or not to transfer , 2005, NIPS 2005.

[29] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[30] Max Welling,et al. Bayesian k-Means as a Maximization-Expectation Algorithm , 2009, Neural Computation.

[31] Nitish Srivastava,et al. Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.