暂无分享,去创建一个
Nir Levine | Gabriel Dulac-Arnold | Sven Gowal | Todd Hester | Jerry Li | Daniel J. Mankowitz | Cosmin Paduraru
[1] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[2] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[3] Shie Mannor,et al. Iterative Hierarchical Optimization for Misspecified Problems (IHOMP) , 2016, ArXiv.
[4] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[5] Sergey Levine,et al. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.
[6] Henryk Michalewski,et al. Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes , 2018, ISC.
[7] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[8] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[9] Joseph A. Paradiso,et al. The gesture recognition toolkit , 2014, J. Mach. Learn. Res..
[10] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[11] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[12] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..
[13] OpenAI. Learning Dexterous In-Hand Manipulation. , 2018 .
[14] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[15] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[16] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[17] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[18] Giovanni De Magistris,et al. OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[19] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[20] John Langford,et al. Making Contextual Decisions with Low Technical Debt , 2016 .
[21] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[22] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[23] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.
[24] Raia Hadsell,et al. Value constrained model-free continuous control , 2019, ArXiv.
[25] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[26] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[27] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.
[28] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
[29] Ang Li,et al. Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control , 2020, ICLR.
[30] Tom Schaul,et al. Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.
[31] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[32] Shie Mannor,et al. A Bayesian Approach to Robust Reinforcement Learning , 2019, UAI.
[33] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[34] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[35] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[36] Xiaohui Ye,et al. Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.
[37] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[38] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[39] András György,et al. Learning from Delayed Outcomes with Intermediate Observations , 2018, ArXiv.
[40] Oleg O. Sushkov,et al. A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[41] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[42] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[43] Oleg O. Sushkov,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.
[44] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[45] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.
[46] Yisong Yue,et al. Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.
[47] Kiri Wagstaff,et al. Machine Learning that Matters , 2012, ICML.
[48] Shie Mannor,et al. Learning Robust Options , 2018, AAAI.
[49] Robert Babuska,et al. Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[50] Che Wang,et al. BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning , 2020, NeurIPS.
[51] Peter Spirtes,et al. An Anytime Algorithm for Causal Inference , 2001, AISTATS.
[52] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[53] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[54] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[55] Romain Laroche,et al. A Fitted-Q Algorithm for Budgeted MDPs , 2018, EWRL 2018.
[56] Yan Wu,et al. Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.
[57] E. Altman. Constrained Markov Decision Processes , 1999 .
[58] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[59] G. Konidaris,et al. Learning to Plan with Portable Symbols , 2018 .
[60] Martin A. Riedmiller,et al. Robust Reinforcement Learning for Continuous Control with Model Misspecification , 2019, ICLR.
[61] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[62] Chris Pal,et al. Real-Time Reinforcement Learning , 2019, NeurIPS.
[63] Shie Mannor,et al. Situational Awareness by Risk-Conscious Skills , 2016, ArXiv.
[64] Rui Wang,et al. Deep Reinforcement Learning for Multiobjective Optimization , 2019, IEEE Transactions on Cybernetics.
[65] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[66] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..
[67] Shie Mannor,et al. Deep Robust Kalman Filter , 2017, ArXiv.
[68] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.
[69] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[70] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[71] Jun Wang,et al. Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.
[72] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..
[73] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.
[74] Craig Boutilier,et al. Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes , 2016, UAI.
[75] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[76] Shie Mannor,et al. Soft-Robust Actor-Critic Policy-Gradient , 2018, UAI.
[77] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[78] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[79] Luc De Raedt,et al. Anytime Inference in Probabilistic Logic Programs with Tp-Compilation , 2015, IJCAI.
[80] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[81] Henry Zhu,et al. ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots , 2019, CoRL.
[82] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[83] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[84] Shie Mannor,et al. Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces , 2019, ArXiv.
[85] Shie Mannor,et al. Probabilistic Goal Markov Decision Processes , 2011, IJCAI.
[86] Jun Wang,et al. Real-Time Bidding: A New Frontier of Computational Advertising Research , 2015, WSDM.
[87] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[88] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[89] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[90] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.
[91] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[92] Andreas Krause,et al. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.
[93] Shie Mannor,et al. Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.
[94] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[95] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[96] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[97] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[98] Dario Amodei,et al. Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .
[99] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[100] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[101] Runzhe Yang,et al. A Generalized Algorithm for Multi-Objective RL and Policy Adaptation , 2019 .
[102] Shie Mannor,et al. Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.
[103] Jianfeng Gao,et al. Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.
[104] Qing Wang,et al. Exponentially Weighted Imitation Learning for Batched Historical Data , 2018, NeurIPS.
[105] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[106] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[107] Craig Boutilier,et al. RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.
[108] A. Cassandra. A Survey of POMDP Applications , 2003 .
[109] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[110] Leslie Pack Kaelbling,et al. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..
[111] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[112] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..