Variational Inference MPC using Tsallis Divergence

In this paper, we provide a generalized framework for Variational Inference-Stochastic Optimal Control by using the non-extensive Tsallis divergence. By incorporating the deformed exponential function into the optimality likelihood function, a novel Tsallis Variational Inference-Model Predictive Control algorithm is derived, which includes prior works such as Variational Inference-Model Predictive Control, Model Predictive Path Integral Control, Cross Entropy Method, and Stein Variational Inference Model Predictive Control as special cases. The proposed algorithm allows for effective control of the cost/reward transform and is characterized by superior performance in terms of mean and variance reduction of the associated cost. The aforementioned features are supported by a theoretical and numerical analysis on the level of risk sensitivity of the proposed algorithm as well as simulation experiments on 5 different robotic systems with 3 different policy parameterizations.

[1]  Bin Liu,et al.  Superdiffusion and non-Gaussian statistics in a driven-dissipative 2D dusty plasma. , 2008, Physical review letters.

[2]  Ning Chen,et al.  Message Passing Stein Variational Gradient Descent , 2017, ICML.

[3]  Evangelos A. Theodorou,et al.  Information Theoretic Model Predictive Control on Jump Diffusion Processes , 2018, 2019 American Control Conference (ACC).

[4]  Adji B. Dieng,et al.  Variational Inference via χ Upper Bound Minimization , 2017 .

[5]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Sertac Karaman,et al.  FlightGoggles: Photorealistic Sensor Simulation for Perception-driven Robotics using Photogrammetry and Virtual Reality , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Tadahiro Taniguchi,et al.  Variational Inference MPC for Bayesian Model-based Reinforcement Learning , 2019, CoRL.

[8]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[9]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[10]  Evangelos A. Theodorou,et al.  Adaptive CVaR Optimization for Dynamical Systems with Path Space Stochastic Search , 2020, ArXiv.

[11]  G. L. Ferri,et al.  Equivalence of the four versions of Tsallis’s statistics , 2005, cond-mat/0503441.

[12]  Sylvain Calinon,et al.  Variational Inference with Mixture Model Approximation for Applications in Robotics , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13]  James M. Rehg,et al.  Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving , 2017, IEEE Transactions on Robotics.

[14]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[15]  E. Lutz Anomalous diffusion and Tsallis statistics in an optical lattice , 2003 .

[16]  Abhinav Gupta,et al.  Learning Robot Skills with Temporal Variational Inference , 2020, ICML.

[17]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[18]  Jiaqiao Hu,et al.  Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization , 2013, IEEE Transactions on Automatic Control.

[19]  C. Tsallis,et al.  Generalized statistical mechanics : connection with thermodynamics , 1991 .

[20]  Evangelos A. Theodorou,et al.  Constrained Sampling-based Trajectory Optimization using Stochastic Approximation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Stefan Jeschke,et al.  Non-smooth Newton Methods for Deformable Multi-body Dynamics , 2019, ACM Trans. Graph..

[22]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[23]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[24]  Constantino Tsallis,et al.  Asymptotically scale-invariant occupancy of phase space makes the entropy Sq extensive , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  R. DeVoe,et al.  Power-law distributions for a trapped ion interacting with a classical buffer gas. , 2009, Physical review letters.

[26]  C. Tsallis,et al.  The role of constraints within generalized nonextensive statistics , 1998 .

[27]  Ricardo Silva,et al.  Alpha-Beta Divergence For Variational Inference , 2018, ArXiv.

[28]  J. Pratt RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .

[29]  Kyungjae Lee,et al.  Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning , 2019, ArXiv.

[30]  Evangelos Theodorou,et al.  Relative entropy and free energy dualities: Connections to Path Integral and KL control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[31]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[32]  Marin Kobilarov,et al.  Cross-Entropy Randomized Motion Planning , 2011, Robotics: Science and Systems.

[33]  C. Tsallis What are the Numbers that Experiments Provide , 1994 .

[34]  Hao Liu,et al.  Variational Inference with Tail-adaptive f-Divergence , 2018, NeurIPS.