Path Integral Control by Reproducing Kernel Hilbert Space Embedding

We present an embedding of stochastic optimal control problems, of the so called path integral form, into reproducing kernel Hilbert spaces. Using consistent, sample based estimates of the embedding leads to a model-free, non-parametric approach for calculation of an approximate solution to the control problem. This formulation admits a decomposition of the problem into an invariant and task dependent component. Consequently, we make much more efficient use of the sample data compared to previous sample based approaches in this domain, e.g., by allowing sample re-use across tasks. Numerical examples on test problems, which illustrate the sample efficiency, are provided.

[1]  Carlos Guestrin,et al.  Nonparametric Tree Graphical Models via Kernel Embeddings , 2010 .

[2]  H. Kappen Optimal control theory and the linear bellman equation , 2011 .

[3]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[4]  Stefan Schaal,et al.  Path integral-based stochastic optimal control for rigid body dynamics , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[5]  Karl J. Friston,et al.  Action and behavior: a free-energy formulation , 2010, Biological Cybernetics.

[6]  Le Song,et al.  Kernel Belief Propagation , 2011, AISTATS.

[7]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[8]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[9]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, IFIP Working Conference on Database Semantics.

[10]  Le Song,et al.  Kernel Bayes' Rule , 2010, NIPS.

[11]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[12]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[13]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[14]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[15]  Hilbert J. Kappen,et al.  Stochastic optimal control of state constrained systems , 2011, Int. J. Control.

[16]  Guy Lever,et al.  Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[17]  Hilbert J. Kappen,et al.  EP for Efficient Stochastic Control with Obstacles , 2010, ECAI.

[18]  E. Todorov,et al.  Moving least-squares approximations for linearly-solvable stochastic optimal control problems , 2011 .

[19]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[20]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[21]  Subramanian Ramamoorthy,et al.  Geodesic trajectory generation on learnt skill manifolds , 2010, 2010 IEEE International Conference on Robotics and Automation.

[22]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.