暂无分享,去创建一个
[1] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[2] Peter W. Glynn,et al. A large deviations perspective on ordinal optimization , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..
[3] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[4] E. Altman. Constrained Markov Decision Processes , 1999 .
[5] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[6] Eric B. Laber,et al. Dynamic treatment regimes: Technical challenges and applications , 2014 .
[7] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[8] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[9] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[10] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[11] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[12] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[13] Craig Boutilier,et al. Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes , 2016, UAI.
[14] Loo Hay Lee,et al. Stochastic Simulation Optimization - An Optimal Computing Budget Allocation , 2010, System Engineering and Operations Research.
[15] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[16] Uriel G. Rothblum,et al. Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes , 2012, Math. Oper. Res..
[17] Yi Zhu,et al. Three asymptotic regimes for ranking and selection with general sample distributions , 2016, 2016 Winter Simulation Conference (WSC).
[18] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.