Information-Theoretic Analysis of Epistemic Uncertainty in Bayesian Meta-learning

The overall predictive uncertainty of a trained predictor can be decomposed into separate contributions due to epistemic and aleatoric uncertainty. Under a Bayesian formulation, assuming a well-specified model, the two contributions can be exactly expressed (for the log-loss) or bounded (for more general losses) in terms of information-theoretic quantities [1]. This paper addresses the study of epistemic uncertainty within an informationtheoretic framework in the broader setting of Bayesian meta-learning. A general hierarchical Bayesian model is assumed in which hyperparameters determine the per-task priors of the model parameters. Exact characterizations (for the log-loss) and bounds (for more general losses) are derived for the epistemic uncertainty – quantified by the minimum excess meta-risk (MEMR) – of optimal metalearning rules. This characterization is leveraged to bring insights into the dependence of the epistemic uncertainty on the number of tasks and on the amount of per-task training data. Experiments are presented that use the proposed information-theoretic bounds, evaluated via neural mutual information estimators, to compare the performance of conventional learning and meta-learning as the number of meta-learning tasks increases. Under review. Figure 1: A graphical model representation of the joint distribution of the relevant quantities for : (a) conventional Bayesian learning; and (b) Bayesian metalearning.

[1]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[2]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[3]  Andrew Gordon Wilson,et al.  The Case for Bayesian Deep Learning , 2020, ArXiv.

[4]  Alex Beatson,et al.  Amortized Bayesian Meta-Learning , 2018, ICLR.

[5]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[6]  Sebastian Nowozin,et al.  Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[7]  Maxim Raginsky,et al.  Information-theoretic analysis of stability and bias of learning algorithms , 2016, 2016 IEEE Information Theory Workshop (ITW).

[8]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[9]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[10]  Himanshu Asnani,et al.  CCMI : Classifier based Conditional Mutual Information Estimation , 2019, UAI.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Shaofeng Zou,et al.  Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[13]  Giuseppe Durisi,et al.  Fast-Rate Loss Bounds via Conditional Information Measures with Applications to Neural Networks , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[14]  Chang Liu,et al.  Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[15]  Osvaldo Simeone,et al.  An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning , 2021 .

[16]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[19]  Andreas Krause,et al.  PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees , 2020, ICML.

[20]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[21]  Theodoros Damoulas,et al.  Generalized Variational Inference , 2019, ArXiv.

[22]  Ron Meir,et al.  Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[23]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[24]  Christoph H. Lampert,et al.  A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[25]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[26]  Samy Bengio,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[27]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[28]  Thomas Steinke,et al.  Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[29]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[30]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[31]  Stefano Ermon,et al.  Understanding the Limitations of Variational Mutual Information Estimators , 2020, ICLR.

[32]  Gustavo Carneiro,et al.  Uncertainty in Model-Agnostic Meta-Learning using Variational Inference , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Rui Gao,et al.  Learning While Dissipating Information: Understanding the Generalization Capability of SGLD , 2021, ArXiv.

[34]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[35]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[36]  Osvaldo Simeone,et al.  Information-Theoretic Generalization Bounds for Meta-Learning and Applications , 2020, Entropy.

[37]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[38]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[39]  L. Goddard Information Theory , 1962, Nature.

[40]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[41]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[42]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[43]  Gintare Karolina Dziugaite,et al.  Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[44]  Osvaldo Simeone,et al.  Conditional Mutual Information-Based Generalization Bound for Meta Learning , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[45]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[46]  M. Raginsky,et al.  Minimum Excess Risk in Bayesian Learning , 2020, IEEE Transactions on Information Theory.

[47]  W. Marsden I and J , 2012 .

[48]  Sebastian Nowozin,et al.  Decision-Theoretic Meta-Learning: Versatile and Efficient Amortization of Few-Shot Learning , 2018, ArXiv.