A Short Note on the Relationship of Information Gain and Eluder Dimension

Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning. Eluder dimension was originally proposed as a general complexity measure of function classes, but the common examples of where it is known to be small are function spaces (vector spaces). In these cases, the primary tool to upper bound the eluder dimension is the elliptic potential lemma. Interestingly, the elliptic potential lemma also features prominently in the analysis of linear bandits/reinforcement learning and their nonparametric generalization, the information gain. We show that this is not a coincidence – eluder dimension and information gain are equivalent in a precise sense for reproducing kernel Hilbert spaces.

[1]  Ruosong Wang,et al.  Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.

[2]  Michael I. Jordan,et al.  On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces , 2021 .

[3]  David Simchi-Levi,et al.  Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective , 2020, COLT.

[4]  Chi Jin,et al.  Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms , 2021, NeurIPS.

[5]  Benjamin Van Roy,et al.  Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.

[6]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[7]  Alexander Rakhlin,et al.  Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.

[8]  Benjamin Van Roy,et al.  An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..

[9]  Ruosong Wang,et al.  Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity , 2020, ArXiv.

[10]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[11]  Nathan Srebro,et al.  Eluder Dimension and Generalized Rank , 2021, ArXiv.

[12]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[13]  Shachar Lovett,et al.  Bilinear Classes: A Structural Framework for Provable Generalization in RL , 2021, ICML.

[14]  Benjamin Van Roy,et al.  Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.

[15]  Sattar Vakili,et al.  On Information Gain and Regret Bounds in Gaussian Process Bandits , 2020, AISTATS.