A path integral approach to agent planning

Control theory is a mathematical description of how to act optimally to gain future rewards. In this paper We discuss a class of non-linear stochastic control problems that can be efficiently solved using a path integral. In this control formalism, the central concept of cost-to-go or value function becomes a free energy and methods and concepts from statistical physics can be readily applied, such as Monte Carlo sampling or the Laplace approximation. When applied to a receding horizon problem in a stationary environment, the solution resembles the one obtained by traditional reinforcement learning with discounted reward. It is shown that this solution can be computed more efficiently than in the discounted reward framework. As shown in previous work, the approach is easily generalized to time-dependent tasks and is therefore of great relevance for modeling real-time interactions between agents.