暂无分享,去创建一个
Gabriel Dulac-Arnold | Todd Hester | Daniel J. Mankowitz | D. Mankowitz | Todd Hester | Gabriel Dulac-Arnold
[1] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[2] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[3] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[4] András György,et al. Learning from Delayed Outcomes with Intermediate Observations , 2018, ArXiv.
[5] Oleg O. Sushkov,et al. A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[6] Sergey Levine,et al. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.
[7] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[8] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[9] Yuxi Li,et al. Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.
[10] Henryk Michalewski,et al. Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes , 2018, ISC.
[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[12] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[13] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[14] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[15] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.
[16] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[17] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.
[18] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[21] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[22] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.
[23] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[24] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[25] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[26] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[27] Shie Mannor,et al. Deep Robust Kalman Filter , 2017, ArXiv.
[28] Andreas Krause,et al. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.
[29] Yan Wu,et al. Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.
[30] Abhinav Verma,et al. Programmatically Interpretable Reinforcement Learning , 2018, ICML.
[31] Yisong Yue,et al. Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.
[32] Jianfeng Gao,et al. Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.
[33] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[34] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[35] Kiri Wagstaff,et al. Machine Learning that Matters , 2012, ICML.
[36] Shie Mannor,et al. Learning Robust Options , 2018, AAAI.
[37] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[38] Peter Stone,et al. RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.
[39] Shie Mannor,et al. Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.
[40] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[41] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[42] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[43] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[44] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[45] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[46] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[47] E. Altman. Constrained Markov Decision Processes , 1999 .
[48] Shie Mannor,et al. Situational Awareness by Risk-Conscious Skills , 2016, ArXiv.
[49] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[50] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[51] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[52] Wojciech Samek,et al. Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..
[53] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[54] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[55] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[56] Craig Boutilier,et al. Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes , 2016, UAI.
[57] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[58] Shie Mannor,et al. Soft-Robust Actor-Critic Policy-Gradient , 2018, UAI.
[59] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[60] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.
[61] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[62] Raia Hadsell,et al. Value constrained model-free continuous control , 2019, ArXiv.
[63] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[64] Giovanni De Magistris,et al. OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[65] Romain Laroche,et al. A Fitted-Q Algorithm for Budgeted MDPs , 2018, EWRL 2018.
[66] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[67] Daniel G. Goldstein,et al. Manipulating and Measuring Model Interpretability , 2018, CHI.
[68] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[69] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..