Policy Search in Continuous Action Domains: an Overview

[1]  Olivier Sigaud,et al.  First-order and second-order variants of the gradient descent: a unified framework , 2018, ArXiv.

[2]  Olivier Sigaud,et al.  CEM-RL: Combining evolutionary and gradient-based methods for policy search , 2018, ICLR.

[3]  Olivier Sigaud,et al.  Importance mixing: Improving sample reuse in evolutionary policy search methods , 2018, ArXiv.

[4]  Pierre-Yves Oudeyer,et al.  Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[5]  Satinder Singh,et al.  Many-Goals Reinforcement Learning , 2018, ArXiv.

[6]  Kagan Tumer,et al.  Evolutionary Reinforcement Learning , 2018, NIPS 2018.

[7]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[8]  Kate Saenko,et al.  Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.

[9]  Kagan Tumer,et al.  Evolution-Guided Policy Gradient in Reinforcement Learning , 2018, NeurIPS.

[10]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[11]  Matthieu Zimmer,et al.  Bootstrapping $Q$ -Learning for Robotics From Neuro-Evolution Results , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[12]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[13]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[14]  Frank Hutter,et al.  Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari , 2018, IJCAI.

[15]  Pierre-Yves Oudeyer,et al.  Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.

[16]  Kamyar Azizzadenesheli,et al.  Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).

[17]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[18]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[19]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[20]  Kenneth O. Stanley,et al.  ES is more than just a traditional finite-difference approximator , 2017, GECCO.

[21]  Jian Peng,et al.  Policy Optimization by Genetic Distillation , 2017, ICLR.

[22]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[23]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[24]  Pierre-Yves Oudeyer,et al.  GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms , 2017, ICML.

[25]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[26]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[27]  Yiannis Demiris,et al.  Quality and Diversity Optimization: A Unifying Modular Framework , 2017, IEEE Transactions on Evolutionary Computation.

[28]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[29]  Kenneth O. Stanley,et al.  On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent , 2017, ArXiv.

[30]  Yunhao Tang,et al.  Variational Deep Q Network , 2017, ArXiv.

[31]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[32]  Jian Peng,et al.  Genetic Policy Optimization , 2017, ICLR 2018.

[33]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[34]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[35]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[37]  Peter Henderson,et al.  Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[38]  Richard E. Turner,et al.  Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[39]  Oliver Brock,et al.  Unsupervised Learning of State Representations for Multiple Tasks , 2017 .

[40]  Jean-Baptiste Mouret,et al.  Black-box data-efficient policy search for robotics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[42]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[43]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[44]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[45]  Koray Kavukcuoglu,et al.  Combining policy gradient and Q-learning , 2016, ICLR.

[46]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[47]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[48]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[49]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[50]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[51]  Koray Kavukcuoglu,et al.  PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.

[52]  Pierre-Yves Oudeyer,et al.  Overlapping waves in tool use development: A curiosity-driven computational model , 2016, 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[53]  Mohamed Chetouani,et al.  Training a robot with evaluative feedback and unlabeled guidance signals , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[54]  Sergey Levine,et al.  Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.

[55]  Olivier Sigaud,et al.  Actor-critic versus direct policy search: a comparison based on sample complexity , 2016, ArXiv.

[56]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[57]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[58]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[59]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[60]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[61]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[62]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[63]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[64]  Olivier Sigaud,et al.  Many regression algorithms, one unified model: A review , 2015, Neural Networks.

[65]  Kenneth O. Stanley,et al.  Confronting the Challenge of Quality Diversity , 2015, GECCO.

[66]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[67]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[68]  Yongxin Yang,et al.  A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.

[69]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[70]  J. H. Metzen,et al.  Bayesian Optimization for Contextual Policy Search * , 2015 .

[71]  Hannes Sommer,et al.  ROCK∗ — Efficient black-box optimization for policy learning , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[72]  Pierre-Yves Oudeyer,et al.  The effects of task difficulty, novelty and the size of the search space on intrinsically motivated exploration , 2014, Front. Neurosci..

[73]  Oliver Brock,et al.  State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction , 2014, Robotics: Science and Systems.

[74]  Stéphane Doncieux,et al.  Beyond black-box optimization: a review of selective pressures for evolutionary robotics , 2014, Evol. Intell..

[75]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[76]  Jan Peters,et al.  Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[77]  Alan Fern,et al.  Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..

[78]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[79]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[80]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[81]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[82]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[83]  Radu State,et al.  Editorial , 1880, Computing.

[84]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[85]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[86]  Olivier Sigaud,et al.  Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .

[87]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[88]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[89]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[90]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[91]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[92]  Faustino J. Gomez,et al.  When Novelty Is Not Enough , 2011, EvoApplications.

[93]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[94]  Pierre-Yves Oudeyer,et al.  Intrinsically motivated goal exploration for active motor learning in robots: A case study , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[95]  Isao Ono,et al.  Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.

[96]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[97]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[98]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[99]  Olivier Buffet,et al.  Markov Decision Processes in Artificial Intelligence , 2010 .

[100]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[101]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[102]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[103]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[104]  Julian Togelius,et al.  Ontogenetic and Phylogenetic Reinforcement Learning , 2009, Künstliche Intell..

[105]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[106]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[107]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[108]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[109]  Shalabh Bhatnagar,et al.  Incremental Natural Actor-Critic Algorithms , 2007, NIPS.

[110]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[111]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[112]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[113]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[114]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[115]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[116]  R. Sutton Learning to predict by the methods of temporal differences , 2004, Machine Learning.

[117]  Risto Miikkulainen,et al.  Efficient evolution of neural network topologies , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[118]  B. Prósper A Unified Perspective , 2002 .

[119]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[120]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[121]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[122]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[123]  Michael Kearns,et al.  Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.

[124]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[125]  Satinder P. Singh,et al.  Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.

[126]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[127]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[128]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[129]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[130]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[131]  Philip E. Gill,et al.  Practical optimization , 1981 .

[132]  Editorial Lazy Learning , .