论文信息 - An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning

An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning

Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical loss measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the metalearner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and JensenShannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.

Osvaldo Simeone | Sharu Theresa Jose

[1] Osvaldo Simeone,et al. Information-Theoretic Generalization Bounds for Meta-Learning and Applications , 2020, Entropy.

[2] Osvaldo Simeone,et al. Conditional Mutual Information Bound for Meta Generalization Gap , 2020, ArXiv.

[3] Ron Meir,et al. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[4] Christoph H. Lampert,et al. A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[5] F. Alajaji,et al. Lectures Notes in Information Theory , 2000 .

[6] Gholamali Aminian,et al. Jensen-Shannon Information Based Characterization of the Generalization Error of Learning Algorithms , 2020, 2020 IEEE Information Theory Workshop (ITW).

[7] Jonathan H. Manton,et al. Information-theoretic analysis for transfer learning , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[8] Lei Zhang,et al. Generalization Bounds for Domain Adaptation , 2012, NIPS.

[9] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[10] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[11] Massimiliano Pontil,et al. Incremental Learning-to-Learn with Statistical Guarantees , 2018, UAI.

[12] Thomas Steinke,et al. Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[13] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14] Osvaldo Simeone,et al. Transfer Meta-Learning: Information- Theoretic Bounds and Information Meta-Risk Minimization , 2020, IEEE Transactions on Information Theory.

[15] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[16] Shaofeng Zou,et al. Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[17] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[18] Andreas Krause,et al. PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees , 2020, ICML.

[19] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[20] Massimiliano Pontil,et al. The Advantage of Conditional Meta-Learning for Biased Regularization and Fine-Tuning , 2020, NeurIPS.

[21] James Zou,et al. Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[22] Giuseppe Durisi,et al. Generalization Bounds via Information Density and Conditional Information Density , 2020, IEEE Journal on Selected Areas in Information Theory.

[23] Gintare Karolina Dziugaite,et al. Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[24] Massimiliano Pontil,et al. Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.

[25] Osvaldo Simeone,et al. Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence , 2020, 2021 29th European Signal Processing Conference (EUSIPCO).

[26] Andreas Maurer,et al. Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[27] Toniann Pitassi,et al. Theoretical bounds on estimation error for meta-learning , 2020, ArXiv.