论文信息 - Information-Theoretic Analysis of Epistemic Uncertainty in Bayesian Meta-learning

Information-Theoretic Analysis of Epistemic Uncertainty in Bayesian Meta-learning

The overall predictive uncertainty of a trained predictor can be decomposed into separate contributions due to epistemic and aleatoric uncertainty. Under a Bayesian formulation, assuming a well-specified model, the two contributions can be exactly expressed (for the log-loss) or bounded (for more general losses) in terms of information-theoretic quantities [1]. This paper addresses the study of epistemic uncertainty within an informationtheoretic framework in the broader setting of Bayesian meta-learning. A general hierarchical Bayesian model is assumed in which hyperparameters determine the per-task priors of the model parameters. Exact characterizations (for the log-loss) and bounds (for more general losses) are derived for the epistemic uncertainty – quantified by the minimum excess meta-risk (MEMR) – of optimal metalearning rules. This characterization is leveraged to bring insights into the dependence of the epistemic uncertainty on the number of tasks and on the amount of per-task training data. Experiments are presented that use the proposed information-theoretic bounds, evaluated via neural mutual information estimators, to compare the performance of conventional learning and meta-learning as the number of meta-learning tasks increases. Under review. Figure 1: A graphical model representation of the joint distribution of the relevant quantities for : (a) conventional Bayesian learning; and (b) Bayesian metalearning.

Osvaldo Simeone | Sharu Theresa Jose | Sangwoo Park | Sangwook Park | O. Simeone

[1] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[2] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[3] Andrew Gordon Wilson,et al. The Case for Bayesian Deep Learning , 2020, ArXiv.

[4] Alex Beatson,et al. Amortized Bayesian Meta-Learning , 2018, ICLR.

[5] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[6] Sebastian Nowozin,et al. Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[7] Maxim Raginsky,et al. Information-theoretic analysis of stability and bias of learning algorithms , 2016, 2016 IEEE Information Theory Workshop (ITW).

[8] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[9] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[10] Himanshu Asnani,et al. CCMI : Classifier based Conditional Mutual Information Estimation , 2019, UAI.

[11] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[12] Shaofeng Zou,et al. Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[13] Giuseppe Durisi,et al. Fast-Rate Loss Bounds via Conditional Information Measures with Applications to Neural Networks , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[14] Chang Liu,et al. Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[15] Osvaldo Simeone,et al. An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning , 2021 .

[16] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[17] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[18] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[19] Andreas Krause,et al. PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees , 2020, ICML.

[20] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[21] Theodoros Damoulas,et al. Generalized Variational Inference , 2019, ArXiv.

[22] Ron Meir,et al. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[23] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[24] Christoph H. Lampert,et al. A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[25] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[26] Samy Bengio,et al. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[27] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[28] Thomas Steinke,et al. Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[29] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[30] A. Barron,et al. Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[31] Stefano Ermon,et al. Understanding the Limitations of Variational Mutual Information Estimators , 2020, ICLR.

[32] Gustavo Carneiro,et al. Uncertainty in Model-Agnostic Meta-Learning using Variational Inference , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33] Rui Gao,et al. Learning While Dissipating Information: Understanding the Generalization Capability of SGLD , 2021, ArXiv.

[34] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[35] Andreas Maurer,et al. Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[36] Osvaldo Simeone,et al. Information-Theoretic Generalization Bounds for Meta-Learning and Applications , 2020, Entropy.

[37] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[38] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[39] L. Goddard. Information Theory , 1962, Nature.

[40] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[41] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[42] Yoshua Bengio,et al. Mutual Information Neural Estimation , 2018, ICML.

[43] Gintare Karolina Dziugaite,et al. Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[44] Osvaldo Simeone,et al. Conditional Mutual Information-Based Generalization Bound for Meta Learning , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[45] James Zou,et al. Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[46] M. Raginsky,et al. Minimum Excess Risk in Bayesian Learning , 2020, IEEE Transactions on Information Theory.

[47] W. Marsden. I and J , 2012 .

[48] Sebastian Nowozin,et al. Decision-Theoretic Meta-Learning: Versatile and Efficient Amortization of Few-Shot Learning , 2018, ArXiv.