Near-optimal Policy Identification in Active Reinforcement Learning
暂无分享,去创建一个
Xiang Li | Willie Neiswanger | J. Schneider | A. Krause | W. Neiswanger | Viraj Mehta | Ilija Bogunovic | Xiang Li | Johannes Kirschner | I. Char | Viraj Mehta | Johannes Kirschner | Ian Char | Ilija Bogunovic
[1] J. Schneider,et al. Exploration via Planning for Information about the Optimal Trajectory , 2022, NeurIPS.
[2] Yu Chen,et al. Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation , 2022, ICML.
[3] Shuang Liu,et al. Provably Efficient Kernelized Q-Learning , 2022, ArXiv.
[4] Martin A. Riedmiller,et al. Magnetic control of tokamak plasmas through deep reinforcement learning , 2022, Nature.
[5] J. Schneider,et al. An Experimental Design Perspective on Model-Based Reinforcement Learning , 2021, ICLR.
[6] Andreas Krause,et al. Misspecified Gaussian Process Bandit Optimization , 2021, NeurIPS.
[7] Csaba Szepesvari,et al. Efficient Local Planning with Linear Function Approximation , 2021, ALT.
[8] Jianqing Fan,et al. Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model , 2021, NeurIPS.
[9] Ke Alexander Wang,et al. Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information , 2021, ICML.
[10] Shachar Lovett,et al. Bilinear Classes: A Structural Framework for Provable Generalization in RL , 2021, ICML.
[11] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[12] Jos'e Miguel Hern'andez-Lobato,et al. Symmetry-Aware Actor-Critic for 3D Molecular Design , 2020, ICLR.
[13] Roshan Shariff,et al. Efficient Planning in Large MDPs with Weak Linear Function Approximation , 2020, NeurIPS.
[14] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[15] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[16] Kenneth O. Stanley,et al. First return, then explore , 2020, Nature.
[17] E. Kaufmann,et al. Kernel-Based Reinforcement Learning: A Finite-Time Analysis , 2020, ICML.
[18] Mykel J. Kochenderfer,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[19] Michael Pearce,et al. Practical Bayesian Optimization of Objectives with Conditioning Variables. , 2020 .
[20] A. Krause,et al. Distributionally Robust Bayesian Optimization , 2020, AISTATS.
[21] José Miguel Hernández-Lobato,et al. Reinforcement Learning for Molecular Design Guided by Quantum Mechanics , 2020, ICML.
[22] Csaba Szepesvari,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2019, ICML.
[23] Lin F. Yang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2019, ICLR.
[24] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[25] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.
[26] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[27] Kirthevasan Kandasamy,et al. Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..
[28] Jürgen Branke,et al. Continuous multi-task Bayesian Optimisation with correlation , 2018, Eur. J. Oper. Res..
[29] Wojciech Jaskowski,et al. Model-Based Active Exploration , 2018, ICML.
[30] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[31] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[32] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[34] Alessandro Lazaric,et al. Best-Arm Identification in Linear Bandits , 2014, NIPS.
[35] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[36] Jasper Snoek,et al. Multi-Task Bayesian Optimization , 2013, NIPS.
[37] D. Ginsbourger,et al. A benchmark of kriging-based infill criteria for noisy optimization , 2013, Structural and Multidisciplinary Optimization.
[38] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[39] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[40] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[41] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[42] Alessandro Lazaric,et al. Multi-Bandit Best Arm Identification , 2011, NIPS.
[43] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[44] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[45] Warren B. Powell,et al. The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..
[46] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[47] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[48] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[49] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[50] James W. Daniel,et al. Splines and efficiency in dynamic programming , 1976 .
[51] F. H. Branin. Widely convergent method for finding multiple solutions of simultaneous nonlinear equations , 1972 .
[52] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .
[53] M. Boyer,et al. Offline Model-Based Reinforcement Learning for Tokamak Control , 2023, L4DC.
[54] Michael I. Jordan,et al. On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces , 2021 .
[55] Y. Na,et al. Feedforward beta control in the KSTAR tokamak by deep reinforcement learning , 2021 .
[56] Kirthevasan Kandasamy,et al. Offline Contextual Bayesian Optimization , 2019, NeurIPS.
[57] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[58] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .