论文信息 - EP for Efficient Stochastic Control with Obstacles - 字舞流文

EP for Efficient Stochastic Control with Obstacles

We address the problem of continuous stochastic optimal control in the presence of hard obstacles. Due to the non-smooth character of the obstacles, the traditional approach using dynamic programming in combination with function approximation tends to fail. We consider a recently introduced special class of control problems for which the optimal control computation is reformulated in terms of a path integral. The path integral is typically intractable, but amenable to techniques developed for approximate inference. We argue that the variational approach fails in this case due to the non-smooth cost function. Sampling techniques are simple to implement and converge to the exact results given enough samples. However, the infinite cost associated with hard obstacles renders the sampling procedures inefficient in practice. We suggest Expectation Propagation (EP) as a suitable approximation method, and compare the quality and efficiency of the resulting control with an MC sampler on a car steering task and a ball throwing task. We conclude that EP can solve these challenging problems much better than a sampling approach.

Hilbert J. Kappen | Thomas Mensink | Jakob J. Verbeek | H. Kappen | J. Verbeek | Thomas Mensink

[1] Brendan J. Frey,et al. Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[2] H. Kappen. An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[3] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4] Hilbert J. Kappen,et al. Optimal on-line scheduling in stochastic multiagent systems in continuous space-time , 2007, AAMAS '07.

[5] Tom Heskes,et al. Novel approximations for inference in nonlinear dynamical systems using expectation propagation , 2005, Neurocomputing.

[6] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[7] Stefan Schaal,et al. Learning Policy Improvements with Path Integrals , 2010, AISTATS.

[8] Marc Toussaint,et al. Pros and Cons of truncated Gaussian EP in the context of Approximate Inference Control , 2009 .

[9] F. Famoye. Continuous Univariate Distributions, Volume 1 , 1994 .

[10] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[11] Nando de Freitas,et al. Fast particle smoothing: if I had a million particles , 2006, ICML.

[12] Hilbert J. Kappen,et al. Stochastic Optimal Control in Continuous Space-Time Multi-Agent Systems , 2006, UAI.

[13] M. Seeger. Expectation Propagation for Exponential Families , 2005 .

[14] Stefan Schaal,et al. Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[16] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[17] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[18] Hilbert J. Kappen,et al. Graphical Model Inference in Optimal Control of Stochastic Multi-Agent Systems , 2008, J. Artif. Intell. Res..

[19] N. L. Johnson,et al. Continuous Univariate Distributions. , 1995 .

[20] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .