暂无分享,去创建一个
Nevena Lazic | Csaba Szepesvári | Yasin Abbasi-Yadkori | Gellért Weisz | Csaba Szepesvari | G. Weisz | Yasin Abbasi-Yadkori | N. Lazic
[1] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[2] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[3] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[4] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[5] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[6] Csaba Szepesvari,et al. Learning near-optimal policies with fitted policy iteration and a single sample path , 2005 .
[7] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.
[8] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[9] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[10] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[13] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[14] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[15] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[16] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.
[17] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[18] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[19] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[20] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[21] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[22] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[23] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[24] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[25] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[26] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[27] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[28] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] J. Urgen Schmidhuber,et al. Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.
[31] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[32] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[33] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[34] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[35] Francesco Orabona,et al. Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.
[36] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.