A reinforcement learning algorithm with fuzzy approximation for semi Markov decision problems

Real life stochastic problems are generally large-scale, difficult to model, and therefore, suffer from the curses of dimensionality. Such problems cannot be solved by classical optimization methods. This paper presents a reinforcement learning algorithm using a fuzzy inference system, ANFIS to find an approximate solution for semi Markov decision problems SMDPs. The performance of the developed algorithm is measured and compared to a classical reinforcement algorithm, SMART in a numerical example. Our numerical examples show that the developed algorithm converges significantly faster as the problem size increases and the average cost calculated by the algorithm gets closer to that of SMART as number of epochs used in the developed algorithm is increased.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  H. R. Berenji,et al.  Fuzzy Q-learning: a new approach for fuzzy dynamic programming , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[3]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[4]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[7]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  P. Y. Glorennec,et al.  Fuzzy Q-learning and dynamical fuzzy Q-learning , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[9]  Mohammad Hossein Fazel Zarandi,et al.  A fuzzy reinforcement learning algorithm for inventory control in supply chains , 2012, The International Journal of Advanced Manufacturing Technology.

[10]  Hamid R. Berenji,et al.  Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[11]  Motohide Umano,et al.  Dynamic fuzzy Q-learning with facility of tuning and removing fuzzy rules , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[12]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[13]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[14]  Janset Kuvulmaz,et al.  Fuzzy control of an ANFIS model representing a nonlinear liquid-level system , 2004, Neural Computing & Applications.

[15]  Zengqi Sun,et al.  Genetic Takagi-Sugeno fuzzy reinforcement learning , 2001, Proceeding of the 2001 IEEE International Symposium on Intelligent Control (ISIC '01) (Cat. No.01CH37206).

[16]  Radulović Jasna,et al.  An ANFIS based approach to approximation of electromagnetic field around overhead power transmission lines , 2008 .

[17]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[18]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[19]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[20]  Chia-Feng Juang,et al.  Self-Organizing Interval Type-2 Fuzzy Q-learning for reinforcement fuzzy control , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[21]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[22]  Masoud Parsinejad,et al.  Estimation of Saturation Percentage of Soil Using Multiple Regression, ANN, and ANFIS Techniques , 2009, Comput. Inf. Sci..

[23]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[24]  S. Kovács,et al.  Reduced Rule Base in Fuzzy Rule Interpolation-based Q-learning , 2009 .

[25]  Chin-Teng Lin,et al.  Neural-Network-Based Fuzzy Logic Control and Decision System , 1991, IEEE Trans. Computers.

[26]  Bart De Schutter,et al.  Continuous-State Reinforcement Learning with Fuzzy Approximation , 2007, Adaptive Agents and Multi-Agents Systems.

[27]  L. Jouffe,et al.  Actor-critic learning based on fuzzy inference system , 1996, 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No.96CH35929).

[28]  Mohamed Mohandes,et al.  Estimation of wind speed profile using adaptive neuro-fuzzy inference system (ANFIS) , 2011 .