暂无分享,去创建一个
Hui Wu | Wenjie Shi | Gao Huang | Shiji Song | Ya-Chu Hsu | Cheng Wu
[1] K. Deimling. Fixed Point Theory , 2008 .
[2] Alexandre d'Aspremont,et al. Regularized nonlinear acceleration , 2016, Mathematical Programming.
[3] Sergey Levine,et al. Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[4] Matthieu Geist,et al. Anderson Acceleration for Reinforcement Learning , 2018, EWRL 2018.
[5] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[6] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[7] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[8] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[9] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[10] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[11] Cheng Wu,et al. Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning , 2019, IJCAI.
[12] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[13] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[14] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[15] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[16] Zhihua Zhang,et al. Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks , 2018, ArXiv.
[17] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[18] Ravi Varadhan,et al. Damped Anderson Acceleration With Restarts and Monotonicity Control for Accelerating EM and EM-like Algorithms , 2018, Journal of Computational and Graphical Statistics.
[19] Cheng Wu,et al. Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[21] Homer F. Walker,et al. Anderson Acceleration for Fixed-Point Iterations , 2011, SIAM J. Numer. Anal..
[22] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[23] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[24] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[25] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[26] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[27] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[28] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[31] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[32] C. T. Kelley,et al. Convergence Analysis for Anderson Acceleration , 2015, SIAM J. Numer. Anal..
[33] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.