Exponentially Weighted Imitation Learning for Batched Historical Data
暂无分享,去创建一个
Qing Wang | Lei Han | Peng Sun | Han Liu | Tong Zhang | Jiechao Xiong | Tong Zhang | Peng Sun | Lei Han | Qing Wang | Han Liu | Jiechao Xiong
[1] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[4] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[5] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[6] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[7] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[8] Han Liu,et al. Feedback-Based Tree Search for Reinforcement Learning , 2018, ICML.
[9] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[10] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.
[11] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[12] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[13] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[14] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[15] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[16] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[18] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[19] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[20] Imre Csiszár,et al. Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .
[21] Qing Wang,et al. Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space , 2018, ArXiv.
[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[23] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[24] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[25] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[26] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[27] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[30] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[31] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[32] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[33] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.