-
爱吃猫的鱼1于 2021年9月28日 18:11
Sergey Levine | Michael I. Jordan | Pieter Abbeel | Philipp Moritz | John Schulman | J. Schulman | Philipp Moritz | S. Levine | P. Abbeel | John Schulman
[1] L. S. Kogan. Review of Principles of Behavior. , 1943 .
[2] B. Skinner,et al. Principles of Behavior , 1944 .
[3] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[7] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[8] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[9] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[12] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[13] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[14] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[15] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[16] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[17] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[18] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[19] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[20] Florentin Wörgötter,et al. Fast biped walking with a reflexive controller and real-time policy searching , 2005, NIPS.
[21] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[22] Zoran Popovic,et al. Optimal gait and form for animal locomotion , 2009, ACM Trans. Graph..
[23] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[24] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[25] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[26] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[27] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[28] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[29] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[30] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[31] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[32] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[33] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .