Maybe a few considerations in Reinforcement Learning Research
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Martin A. Riedmiller,et al. Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[3] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[4] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[5] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[6] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[7] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[8] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[13] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[14] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[15] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[16] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[17] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[18] Marilyn A. Walker,et al. Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.
[19] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[20] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[21] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[22] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[23] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[26] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[27] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.