Approximate dynamic programming with a fuzzy parameterization

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem.

[1]  Bart De Schutter,et al.  Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[2]  Csaba Szepesvári,et al.  Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[3]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[4]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Martin Brown,et al.  Neurofuzzy adaptive modelling and control , 1994 .

[7]  LinChuan-Kai A reinforcement learning adaptive fuzzy controller for robots , 2003 .

[8]  Pierre Yves Glorennec,et al.  Reinforcement Learning: an Overview , 2000 .

[9]  Bart De Schutter,et al.  Consistency of fuzzy model-based reinforcement learning , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[10]  Bart De Schutter,et al.  Fuzzy Approximation for Convergent Model-Based Reinforcement Learning , 2007, 2007 IEEE International Fuzzy Systems Conference.

[11]  Benjamin Van Roy,et al.  Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[12]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[13]  Bart De Schutter,et al.  Continuous-State Reinforcement Learning with Fuzzy Approximation , 2007, Adaptive Agents and Multi-Agents Systems.

[14]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[15]  Frank Klawonn,et al.  Foundations of fuzzy systems , 1994 .

[16]  Manuel S. Santos,et al.  Analysis of a Numerical Dynamic Programming Algorithm Applied to Economic Models , 1998 .

[17]  Chuan-Kai Lin,et al.  A reinforcement learning adaptive fuzzy controller for robots , 2003, Fuzzy Sets Syst..

[18]  T. Horiuchi,et al.  Fuzzy interpolation-based Q-learning with continuous states and actions , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[19]  Shie Mannor,et al.  Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[20]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .

[21]  Vasile I. Istrăţescu,et al.  Fixed point theory : an introduction , 1981 .

[22]  William D. Smart,et al.  Interpolation-based Q-learning , 2004, ICML.

[23]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[24]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[25]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[26]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[27]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[28]  Hamid R. Berenji,et al.  A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters , 2003, IEEE Trans. Fuzzy Syst..

[29]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[30]  J. Tsitsiklis,et al.  An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .