论文信息 - Non-Linear Stochastic Control in Continuous State Spaces by Exact Integration in Bellman's Equations

Non-Linear Stochastic Control in Continuous State Spaces by Exact Integration in Bellman's Equations

We present an algorithm for sequential control of tasks with non-linear stochastic dynamics in continuous state spaces, characterized by inhomogeneous noise. The algorithm performs approximate value iteration steps on a select set of prototypical states whose cost-to-go is approximated by means of a radial-basis function network. This allows the resulting Bellman’s equations to be integrated exactly with respect to the transition densities of a large class of stochastic dynamical systems, resulting in a fast and efficient modified value-iteration procedure.

Daniel N. Nikovski | Matthew E. Brand | M. Brand | D. Nikovski

[1] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .

[2] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.

[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[5] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[6] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[7] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[8] Sebastian Thrun,et al. Probabilistic Algorithms in Robotics , 2000, AI Mag..

[9] T. Poggio,et al. Networks and the best approximation property , 1990, Biological Cybernetics.

[10] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .

[11] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .

[15] Wolfram Burgard,et al. A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots , 1998, Auton. Robots.

[16] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD-lambda Network , 1995, NIPS.

[17] Dirk Ormoneit,et al. Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.