Non-Linear Stochastic Control in Continuous State Spaces by Exact Integration in Bellman's Equations

We present an algorithm for sequential control of tasks with non-linear stochastic dynamics in continuous state spaces, characterized by inhomogeneous noise. The algorithm performs approximate value iteration steps on a select set of prototypical states whose cost-to-go is approximated by means of a radial-basis function network. This allows the resulting Bellman’s equations to be integrated exactly with respect to the transition densities of a large class of stochastic dynamical systems, resulting in a fast and efficient modified value-iteration procedure.

[1]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[2]  Geoffrey J. Gordon Stable Fitted Reinforcement Learning , 1995, NIPS.

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[5]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[6]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[7]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[8]  Sebastian Thrun,et al.  Probabilistic Algorithms in Robotics , 2000, AI Mag..

[9]  T. Poggio,et al.  Networks and the best approximation property , 1990, Biological Cybernetics.

[10]  Mark W. Spong,et al.  The swing up control problem for the Acrobot , 1995 .

[11]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[12]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[15]  Wolfram Burgard,et al.  A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots , 1998, Auton. Robots.

[16]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD-lambda Network , 1995, NIPS.

[17]  Dirk Ormoneit,et al.  Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.