Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization

Recent works in high-dimensional model-predictive control and model-based reinforcement learning with learned dynamics and reward models have resorted to population-based optimization methods, such as the Cross-Entropy Method (CEM), for planning a sequence of actions. To decide on an action to take, CEM conducts a search for the action sequence with the highest return according to the dynamics model and reward. Action sequences are typically randomly sampled from an unconditional Gaussian distribution and evaluated on the environment. This distribution is iteratively updated towards action sequences with higher returns. However, this planning method can be very inefficient, especially for high-dimensional action spaces. An alternative line of approaches optimize action sequences directly via gradient descent, but are prone to local optima. We propose a method to solve this planning problem by interleaving CEM and gradient descent steps in optimizing the action sequence. Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces, avoidance of local minima, and better or equal performance to CEM. Code accompanying the paper is available here this https URL.

[1]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[2]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[3]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[4]  Denis Yarats,et al.  The Differentiable Cross-Entropy Method , 2020, ICML.

[5]  Pierre-Brice Wieber,et al.  Fast Direct Multiple Shooting Algorithms for Optimal Robot Control , 2005 .

[6]  Allan Jabri,et al.  Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control , 2018, ICML.

[7]  K. Subbarao,et al.  Hybrid Genetic Algorithm Collocation Method for Trajectory Optimization , 2009 .

[8]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[9]  Marin Kobilarov,et al.  Cross-entropy motion planning , 2012, Int. J. Robotics Res..

[10]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[11]  Jimmy Ba,et al.  Exploring Model-based Planning with Policy Networks , 2019, ICLR.

[12]  Yoshua Bengio,et al.  A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[13]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[14]  Oskar von Stryk,et al.  Direct and indirect methods for trajectory optimization , 1992, Ann. Oper. Res..

[15]  James M. Rehg,et al.  Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Olivier Sigaud,et al.  CEM-RL: Combining evolutionary and gradient-based methods for policy search , 2018, ICLR.

[17]  Yann LeCun,et al.  Model-Based Planning with Discrete and Continuous Actions , 2017 .

[18]  Tadahiro Taniguchi,et al.  Variational Inference MPC for Bayesian Model-based Reinforcement Learning , 2019, CoRL.

[19]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[20]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[21]  Reuven Y. Rubinstein,et al.  Estimation of rare event probabilities using cross-entropy , 2002, Proceedings of the Winter Simulation Conference.