暂无分享,去创建一个
[1] Nikhil R. Devanur,et al. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.
[2] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[3] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[4] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[5] Aleksandrs Slivkins,et al. Constrained episodic reinforcement learning in concave-convex and knapsack settings , 2020, NeurIPS.
[6] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[7] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[8] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[9] Zhuoran Yang,et al. Provably Efficient Safe Exploration via Primal-Dual Policy Optimization , 2020, AISTATS.
[10] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[11] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[12] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[13] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[14] Benjamin Van Roy,et al. Information-Theoretic Confidence Bounds for Reinforcement Learning , 2019, NeurIPS.
[15] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[16] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[17] Suvrit Sra,et al. Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes , 2020, NeurIPS.
[18] Ruosong Wang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[19] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[20] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[21] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[22] Lillian J. Ratliff,et al. Constrained Upper Confidence Reinforcement Learning , 2020, L4DC.
[23] KleinbergRobert,et al. Bandits with Knapsacks , 2018 .
[24] Shie Mannor,et al. Exploration-Exploitation in Constrained MDPs , 2020, ArXiv.
[25] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[26] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[27] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[28] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[29] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[30] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[31] Ness B. Shroff,et al. Learning in Markov Decision Processes under Constraints , 2020, ArXiv.
[32] Paolo Toth,et al. Knapsack Problems: Algorithms and Computer Implementations , 1990 .
[33] John E. Beasley. Multidimensional Knapsack Problems , 2009, Encyclopedia of Optimization.