Discretizing Continuous Action Space for On-Policy Optimization

In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization. The explosion in the number of discrete actions can be efficiently addressed by a policy with factorized distribution across action dimensions. We show that the discrete policy achieves significant performance gains with state-of-the-art on-policy optimization algorithms (PPO, TRPO, ACKTR) especially on high-dimensional tasks with complex dynamics. Additionally, we show that an ordinal parameterization of the discrete distribution can introduce the inductive bias that encodes the natural ordering between discrete actions. This ordinal architecture further significantly improves the performance of PPO/TRPO.

[1]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[2]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[3]  Michail G. Lagoudakis,et al.  Binary action search for learning continuous-action control policies , 2009, ICML '09.

[4]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[5]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[6]  Jan Koutník,et al.  Reinforcement Learning to Run… Fast , 2018 .

[7]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[8]  Pascal Vincent,et al.  Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.

[9]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[10]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[11]  Yunhao Tang,et al.  Implicit Policy for Reinforcement Learning , 2018, ArXiv.

[12]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[13]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[14]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[15]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[16]  Navdeep Jaitly,et al.  Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.

[17]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[18]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[19]  Mohammad Emtiyaz Khan,et al.  A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models , 2012, AISTATS.

[20]  Sebastian Scherer,et al.  Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.

[21]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[24]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[25]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[26]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[27]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[28]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[29]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[30]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[31]  Arash Tavakoli,et al.  Action Branching Architectures for Deep Reinforcement Learning , 2017, AAAI.

[32]  Gianluca Pollastri,et al.  A neural network approach to ordinal regression , 2007, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[33]  Christopher Winship,et al.  REGRESSION MODELS WITH ORDINAL VARIABLES , 1984 .

[34]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[35]  Marco Wiering,et al.  Using continuous action spaces to solve discrete problems , 2009, 2009 International Joint Conference on Neural Networks.

[36]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.