暂无分享,去创建一个
[1] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[2] Jan Peters,et al. Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.
[3] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[4] Wotao Yin,et al. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..
[5] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[6] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[7] Xiaohui Ye,et al. Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.
[8] C. Villani,et al. Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .
[9] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[10] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[11] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[12] Xi Chen,et al. Large-Scale Markov Decision Problems via the Linear Programming Dual , 2019, ArXiv.
[13] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[14] John B. Moore,et al. Infinite-dimensional quadratic optimization: Interior-point methods and control applications , 1997 .
[15] Vikash Kumar,et al. Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.
[16] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[17] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[18] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[19] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[20] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[21] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.
[22] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[23] Bernhard Schölkopf,et al. A Kernel Approach to Comparing Distributions , 2007, AAAI.
[24] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[25] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[26] Ethan Knight,et al. Natural Gradient Deep Q-learning , 2018, ArXiv.
[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[28] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[29] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[31] Leslie Pack Kaelbling,et al. Residual Policy Learning , 2018, ArXiv.
[32] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[33] Sergey Levine,et al. Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[34] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[35] D. Hunter,et al. A Tutorial on MM Algorithms , 2004 .