Refined Regret for Adversarial MDPs with Linear Function Approximation
暂无分享,去创建一个
[1] Aviv A. Rosenberg,et al. Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback , 2023, ICML.
[2] Shuai Li,et al. Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization , 2023, Trans. Mach. Learn. Res..
[3] Y. Mansour,et al. Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation , 2023, ICML.
[4] Kevin G. Jamieson,et al. Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design , 2022, NeurIPS.
[5] Chen-Yu Wei,et al. Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses , 2021, NeurIPS.
[6] Shipra Agrawal,et al. Scale Free Adversarial Multi Armed Bandits , 2021, ALT.
[7] Alekh Agarwal,et al. Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation , 2021, COLT.
[8] Quanquan Gu,et al. Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs , 2021, AISTATS.
[9] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.
[10] Haipeng Luo,et al. Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation , 2020, AISTATS.
[11] Wen Sun,et al. PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning , 2020, NeurIPS.
[12] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[13] Gergely Neu,et al. Online learning in MDPs with linear function approximation and bandit feedback , 2020, NeurIPS.
[14] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[15] Ruosong Wang,et al. On Reward-Free Reinforcement Learning with Linear Function Approximation , 2020, NeurIPS.
[16] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[17] Gergely Neu,et al. Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits , 2020, COLT.
[18] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[19] Chi Jin,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2019, ICML.
[20] Haipeng Luo,et al. Equipping Experts/Bandits with Long-term Memory , 2019, NeurIPS.
[21] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[22] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[23] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[24] Haipeng Luo,et al. More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.
[25] Éva Tardos,et al. Learning in Games: Robustness of Fast Convergence , 2016, NIPS.
[26] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[27] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.
[28] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[29] L. Meng,et al. The optimal perturbation bounds of the Moore–Penrose inverse under the Frobenius norm , 2010 .
[30] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[31] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[32] Tor Lattimore,et al. Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits , 2022, COLT.
[33] Shinji Ito,et al. Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds , 2021, COLT.
[34] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..