An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning

Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical loss measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the metalearner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and JensenShannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.

[1]  Osvaldo Simeone,et al.  Information-Theoretic Generalization Bounds for Meta-Learning and Applications , 2020, Entropy.

[2]  Osvaldo Simeone,et al.  Conditional Mutual Information Bound for Meta Generalization Gap , 2020, ArXiv.

[3]  Ron Meir,et al.  Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[4]  Christoph H. Lampert,et al.  A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[5]  F. Alajaji,et al.  Lectures Notes in Information Theory , 2000 .

[6]  Gholamali Aminian,et al.  Jensen-Shannon Information Based Characterization of the Generalization Error of Learning Algorithms , 2020, 2020 IEEE Information Theory Workshop (ITW).

[7]  Jonathan H. Manton,et al.  Information-theoretic analysis for transfer learning , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[8]  Lei Zhang,et al.  Generalization Bounds for Domain Adaptation , 2012, NIPS.

[9]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[10]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[11]  Massimiliano Pontil,et al.  Incremental Learning-to-Learn with Statistical Guarantees , 2018, UAI.

[12]  Thomas Steinke,et al.  Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[13]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14]  Osvaldo Simeone,et al.  Transfer Meta-Learning: Information- Theoretic Bounds and Information Meta-Risk Minimization , 2020, IEEE Transactions on Information Theory.

[15]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[16]  Shaofeng Zou,et al.  Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[17]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[18]  Andreas Krause,et al.  PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees , 2020, ICML.

[19]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[20]  Massimiliano Pontil,et al.  The Advantage of Conditional Meta-Learning for Biased Regularization and Fine-Tuning , 2020, NeurIPS.

[21]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[22]  Giuseppe Durisi,et al.  Generalization Bounds via Information Density and Conditional Information Density , 2020, IEEE Journal on Selected Areas in Information Theory.

[23]  Gintare Karolina Dziugaite,et al.  Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[24]  Massimiliano Pontil,et al.  Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.

[25]  Osvaldo Simeone,et al.  Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence , 2020, 2021 29th European Signal Processing Conference (EUSIPCO).

[26]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[27]  Toniann Pitassi,et al.  Theoretical bounds on estimation error for meta-learning , 2020, ArXiv.