Policy Optimization with Model-based Explorations
暂无分享,去创建一个
Qing He | Qing Da | Pingzhong Tang | Anxiang Zeng | Feiyang Pan | Qingpeng Cai | Hua-Lin He | Chun-Xiang Pan | Qing He | Pingzhong Tang | Feiyang Pan | Qingpeng Cai | Qing Da | Anxiang Zeng | Hua-Lin He | C. Pan
[1] Richard Y. Chen,et al. UCB EXPLORATION VIA Q-ENSEMBLES , 2018 .
[2] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[3] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[4] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[5] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[8] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Yiwei Zhang,et al. Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce , 2018, AAAI.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[13] Pingzhong Tang,et al. Generalized deterministic policy gradient algorithms , 2018, ArXiv.
[14] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[15] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[16] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[17] Yiwei Zhang,et al. Reinforcement Mechanism Design for e-commerce , 2017, WWW.
[18] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[19] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[20] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[21] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[22] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..
[23] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[24] Pingzhong Tang,et al. Reinforcement mechanism design , 2017, IJCAI.
[25] Pingzhong Tang,et al. Deterministic Policy Gradients With General State Transitions , 2018, 1807.03708.
[26] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[27] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[28] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.
[29] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[30] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[31] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[32] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[33] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.