Novel Policy Seeking with Constrained Optimization
暂无分享,去创建一个
Bolei Zhou | Dahua Lin | Bo Dai | Hao Sun | Zhenghao Peng | Jian Guo | Hao Sun
[1] Zhenghao Peng,et al. Safe Exploration by Solving Early Terminated MDP , 2021, ArXiv.
[2] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[3] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.
[4] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[5] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[6] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[7] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[8] Shie Mannor,et al. Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.
[9] Greg Turk,et al. Learning Novel Policies For Tasks , 2019, ICML.
[10] Richard Socher,et al. Competitive Experience Replay , 2019, ICLR.
[11] Marc G. Bellemare,et al. An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.
[12] Nando de Freitas,et al. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.
[13] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[14] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[15] Qiang Liu,et al. Learning Self-Imitating Diverse Policies , 2018, ICLR.
[16] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[17] Dario Amodei,et al. Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .
[18] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[19] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[20] Joel Z. Leibo,et al. Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.
[21] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[22] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[23] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[24] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[25] Alexander Peysakhovich,et al. Consequentialist conditional cooperation in social dilemmas with imperfect information , 2017, AAAI Workshops.
[26] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[27] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[28] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[29] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.
[30] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[31] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[32] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.
[33] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[34] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[35] Rachna,et al. Sapiens: A brief history of humankind , 2017 .
[36] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[37] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[38] Kenneth O. Stanley,et al. Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.
[39] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[40] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[41] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[42] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[43] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[44] J. Schulman,et al. Variational Information Maximizing Exploration , 2016 .
[45] J. Henrich. The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter , 2015 .
[46] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[47] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[48] Ana Paiva,et al. Emerging social awareness: Exploring intrinsic motivation in multiagent learning , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).
[49] Judith M Burkart,et al. Social learning and evolution: the cultural intelligence hypothesis , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.
[50] Kenneth O. Stanley,et al. Novelty Search and the Problem with Objectives , 2011 .
[51] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[52] C. Villani. Optimal Transport: Old and New , 2008 .
[53] M. Grzes,et al. Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.
[54] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[55] D. Griffel. Linear programming 2: Theory and extensions , by G. B. Dantzig and M. N. Thapa. Pp. 408. £50.00. 2003 ISBN 0 387 00834 9 (Springer). , 2004, The Mathematical Gazette.
[56] Flemming Topsøe,et al. Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..
[57] Stephen J. Wright,et al. Primal-Dual Interior-Point Methods , 1997 .
[58] G. DeJong,et al. Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .
[59] Dominik Endres,et al. A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.
[60] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[61] Stephen J. Wright. On the convergence of the Newton/log-barrier method , 2001, Math. Program..
[62] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.
[63] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[64] E. Altman. Constrained Markov Decision Processes , 1999 .
[65] J. Herskovits. Feasible Direction Interior-Point Technique for Nonlinear Optimization , 1998 .
[66] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[67] Nicholas I. M. Gould,et al. A globally convergent Lagrangian barrier algorithm for optimization with general inequality constraints and simple bounds , 1997, Math. Comput..
[68] B. Rogoff. Apprenticeship in Thinking: Cognitive Development in Social Context , 1990 .
[69] Gunar E. Liepins,et al. Deceptiveness and Genetic Algorithm Dynamics , 1990, FOGA.
[70] Ludger Riischendorf. The Wasserstein distance and approximation theorems , 1985 .
[71] Andrzej Ruszczynski,et al. Feasible direction methods for stochastic programming problems , 1980, Math. Program..