暂无分享,去创建一个
Zhenghao Peng | Bolei Zhou | Jian Guo | Bo Dai | Dahua Lin | Hao Sun
[1] Flemming Topsøe,et al. Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..
[2] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[3] Stephen J. Wright,et al. Primal-Dual Interior-Point Methods , 1997 .
[4] Andrzej Ruszczynski,et al. Feasible direction methods for stochastic programming problems , 1980, Math. Program..
[5] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[6] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[7] E. Altman. Constrained Markov Decision Processes , 1999 .
[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[9] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[10] Marc G. Bellemare,et al. An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.
[11] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[12] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[13] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[14] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[15] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[16] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[17] J. Herskovits. Feasible Direction Interior-Point Technique for Nonlinear Optimization , 1998 .
[18] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[19] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.
[20] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[21] Nicholas I. M. Gould,et al. A globally convergent Lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds , 1997, Math. Comput..
[22] Kenneth O. Stanley,et al. Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.
[23] Qiang Liu,et al. Learning Self-Imitating Diverse Policies , 2018, ICLR.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Ludger Riischendorf. The Wasserstein distance and approximation theorems , 1985 .
[26] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[27] Dominik Endres,et al. A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.
[30] Greg Turk,et al. Learning Novel Policies For Tasks , 2019, ICML.
[31] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[32] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[33] Shie Mannor,et al. Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.
[34] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[35] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[36] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[37] D. Griffel. Linear programming 2: Theory and extensions , by G. B. Dantzig and M. N. Thapa. Pp. 408. £50.00. 2003 ISBN 0 387 00834 9 (Springer). , 2004, The Mathematical Gazette.
[38] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[39] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[40] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[41] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[42] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[43] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[44] Richard Socher,et al. Competitive Experience Replay , 2019, ICLR.
[45] Stephen J. Wright. On the convergence of the Newton/log-barrier method , 2001, Math. Program..
[46] C. Villani. Optimal Transport: Old and New , 2008 .
[47] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[48] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.
[49] Dario Amodei,et al. Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .
[50] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[51] Kenneth O. Stanley,et al. Novelty Search and the Problem with Objectives , 2011 .
[52] J. Schulman,et al. Variational Information Maximizing Exploration , 2016 .
[53] Gunar E. Liepins,et al. Deceptiveness and Genetic Algorithm Dynamics , 1990, FOGA.
[54] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.