Policy Search in Continuous Action Domains: an Overview
暂无分享,去创建一个
[1] Olivier Sigaud,et al. First-order and second-order variants of the gradient descent: a unified framework , 2018, ArXiv.
[2] Olivier Sigaud,et al. CEM-RL: Combining evolutionary and gradient-based methods for policy search , 2018, ICLR.
[3] Olivier Sigaud,et al. Importance mixing: Improving sample reuse in evolutionary policy search methods , 2018, ArXiv.
[4] Pierre-Yves Oudeyer,et al. Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.
[5] Satinder Singh,et al. Many-Goals Reinforcement Learning , 2018, ArXiv.
[6] Kagan Tumer,et al. Evolutionary Reinforcement Learning , 2018, NIPS 2018.
[7] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[8] Kate Saenko,et al. Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.
[9] Kagan Tumer,et al. Evolution-Guided Policy Gradient in Reinforcement Learning , 2018, NeurIPS.
[10] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[11] Matthieu Zimmer,et al. Bootstrapping $Q$ -Learning for Robotics From Neuro-Evolution Results , 2018, IEEE Transactions on Cognitive and Developmental Systems.
[12] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[13] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[14] Frank Hutter,et al. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari , 2018, IJCAI.
[15] Pierre-Yves Oudeyer,et al. Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.
[16] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[17] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.
[18] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[19] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[20] Kenneth O. Stanley,et al. ES is more than just a traditional finite-difference approximator , 2017, GECCO.
[21] Jian Peng,et al. Policy Optimization by Genetic Distillation , 2017, ICLR.
[22] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[23] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[24] Pierre-Yves Oudeyer,et al. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms , 2017, ICML.
[25] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[26] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[27] Yiannis Demiris,et al. Quality and Diversity Optimization: A Unifying Modular Framework , 2017, IEEE Transactions on Evolutionary Computation.
[28] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[29] Kenneth O. Stanley,et al. On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent , 2017, ArXiv.
[30] Yunhao Tang,et al. Variational Deep Q Network , 2017, ArXiv.
[31] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[32] Jian Peng,et al. Genetic Policy Optimization , 2017, ICLR 2018.
[33] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..
[34] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[35] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[36] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[37] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[38] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[39] Oliver Brock,et al. Unsupervised Learning of State Representations for Multiple Tasks , 2017 .
[40] Jean-Baptiste Mouret,et al. Black-box data-efficient policy search for robotics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[41] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[42] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[43] Trevor Darrell,et al. Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.
[44] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[45] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[46] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[47] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[48] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[49] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[50] Anne Auger,et al. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..
[51] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[52] Pierre-Yves Oudeyer,et al. Overlapping waves in tool use development: A curiosity-driven computational model , 2016, 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).
[53] Mohamed Chetouani,et al. Training a robot with evaluative feedback and unlabeled guidance signals , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).
[54] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[55] Olivier Sigaud,et al. Actor-critic versus direct policy search: a comparison based on sample complexity , 2016, ArXiv.
[56] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[57] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[58] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[59] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[60] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[61] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[62] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[63] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[64] Olivier Sigaud,et al. Many regression algorithms, one unified model: A review , 2015, Neural Networks.
[65] Kenneth O. Stanley,et al. Confronting the Challenge of Quality Diversity , 2015, GECCO.
[66] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[67] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[68] Yongxin Yang,et al. A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.
[69] Antoine Cully,et al. Robots that can adapt like animals , 2014, Nature.
[70] J. H. Metzen,et al. Bayesian Optimization for Contextual Policy Search * , 2015 .
[71] Hannes Sommer,et al. ROCK∗ — Efficient black-box optimization for policy learning , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.
[72] Pierre-Yves Oudeyer,et al. The effects of task difficulty, novelty and the size of the search space on intrinsically motivated exploration , 2014, Front. Neurosci..
[73] Oliver Brock,et al. State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction , 2014, Robotics: Science and Systems.
[74] Stéphane Doncieux,et al. Beyond black-box optimization: a review of selective pressures for evolutionary robotics , 2014, Evol. Intell..
[75] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[76] Jan Peters,et al. Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.
[77] Alan Fern,et al. Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..
[78] Olivier Sigaud,et al. Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.
[79] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[80] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[81] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.
[82] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..
[83] Radu State,et al. Editorial , 1880, Computing.
[84] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[85] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[86] Olivier Sigaud,et al. Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .
[87] Olivier Sigaud,et al. Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.
[88] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[89] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[90] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[91] Kenneth O. Stanley,et al. Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.
[92] Faustino J. Gomez,et al. When Novelty Is Not Enough , 2011, EvoApplications.
[93] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.
[94] Pierre-Yves Oudeyer,et al. Intrinsically motivated goal exploration for active motor learning in robots: A case study , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[95] Isao Ono,et al. Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.
[96] Tom Schaul,et al. Exponential natural evolution strategies , 2010, GECCO '10.
[97] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[98] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[99] Olivier Buffet,et al. Markov Decision Processes in Artificial Intelligence , 2010 .
[100] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[101] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.
[102] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.
[103] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[104] Julian Togelius,et al. Ontogenetic and Phylogenetic Reinforcement Learning , 2009, Künstliche Intell..
[105] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[106] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[107] Dario Floreano,et al. Neuroevolution: from architectures to learning , 2008, Evol. Intell..
[108] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[109] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[110] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[111] Tao Wang,et al. Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.
[112] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[113] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[114] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .
[115] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[116] R. Sutton. Learning to predict by the methods of temporal differences , 2004, Machine Learning.
[117] Risto Miikkulainen,et al. Efficient evolution of neural network topologies , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).
[118] B. Prósper. A Unified Perspective , 2002 .
[119] Goldberg,et al. Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.
[120] J. A. Lozano,et al. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .
[121] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[122] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[123] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[124] D. Goldberg,et al. BOA: the Bayesian optimization algorithm , 1999 .
[125] Satinder P. Singh,et al. Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.
[126] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .
[127] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..
[128] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[129] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.
[130] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[131] Philip E. Gill,et al. Practical optimization , 1981 .
[132] Editorial Lazy Learning , .