暂无分享,去创建一个
[1] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[2] Chen Liang,et al. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.
[3] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[4] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[5] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[6] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[7] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[8] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.
[9] Chen Liang,et al. Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing , 2018, NeurIPS.
[10] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[11] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[12] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[14] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.
[15] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[16] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[17] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[19] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[20] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.
[21] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[22] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[23] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[24] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[25] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[26] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[27] Quoc V. Le,et al. Neural Program Synthesis with Priority Queue Training , 2018, ArXiv.
[28] Chen Liang,et al. Memory Augmented Policy Optimization for Program Synthesis with Generalization , 2018, ArXiv.
[29] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[30] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[31] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[32] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[33] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.
[34] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[35] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.
[36] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[37] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[38] Pieter Abbeel,et al. Mutual Alignment Transfer Learning , 2017, CoRL.
[39] J. Hammersley. SIMULATION AND THE MONTE CARLO METHOD , 1982 .
[40] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[41] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[42] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[43] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[44] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[45] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[46] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[47] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[48] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[49] O. Chapelle,et al. Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.
[50] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.