Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies

Abstract Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because it requires exploration of the problem’s dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.

[1]  Xuan Li,et al.  Robust fuzzy-scheduling control for nonlinear systems subject to actuator saturation via delta operator approach , 2014, Inf. Sci..

[2]  Ralph Neuneier,et al.  How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.

[3]  Luis Magdalena,et al.  Interpretability Improvements to Find the Balance Interpretability-Accuracy in Fuzzy Modeling: An Overview , 2003 .

[4]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[5]  James Lam,et al.  Dynamic output feedback H ∞ control of discrete-time fuzzy systems: a fuzzy-basis-dependent Lyapunov function approach , 2007, Int. J. Syst. Sci..

[6]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[7]  Xuan Li,et al.  Fault Detection for Uncertain Fuzzy Systems Based on the Delta Operator Approach , 2014, Circuits Syst. Signal Process..

[8]  S. Shao Fuzzy self-organizing controller and its application for dynamic processes , 1988 .

[9]  Andries Petrus Engelbrecht,et al.  Fundamentals of Computational Swarm Intelligence , 2005 .

[10]  S. Udluft,et al.  A Recurrent Control Neural Network for Data Efficient Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[11]  Thomas A. Runkler,et al.  Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces , 2016, Int. J. Swarm Intell. Res..

[12]  Thomas Martinetz,et al.  Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification , 2007, ICANN.

[13]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[14]  R. W. Dobbins,et al.  Computational intelligence PC tools , 1996 .

[15]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[16]  Pieter Bram Bakker,et al.  The state of mind : reinforcement learning with recurrent neural networks , 2004 .

[17]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[18]  Yu-Lun Lo,et al.  Adaptive network-based fuzzy inference system for developing a simplified questionnaire to assess the coexistence of severe obstructive sleep apnea syndrome in patients with chronic obstructive airway disease , 2014 .

[19]  Martin A. Riedmiller,et al.  Reducing policy degradation in neuro-dynamic programming , 2006, ESANN.

[20]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[21]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[22]  Louis Wehenkel,et al.  Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning , 2012, Discovery Science.

[23]  Lakshmi Ponnusamy,et al.  PSO tuned Adaptive Neuro-fuzzy Controller for Vehicle Suspension Systems , 2012 .

[24]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[25]  Ashwani Kharola,et al.  Stabilization of inverted pendulum using hybrid adaptive neuro fuzzy (ANFIS) controller , 2014 .

[26]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[29]  Xuan Li,et al.  Fault-tolerant control for a class of T-S fuzzy systems via delta operator approach , 2014, Signal Process..

[30]  Ebrahim H. Mamdani,et al.  A linguistic self-organizing process controller , 1979, Autom..

[31]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[32]  Kazuyuki Murase,et al.  Particle Swarm Optimization Based Adaptive Strategy for Tuning of Fuzzy Logic Controller , 2013 .

[33]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[34]  Tharwat O. S. Hanafy Design and validation of Real Time Neuro Fuzzy Controller for stabilization of Pendulum-Cart System , 2011 .

[35]  L X Wang,et al.  Fuzzy basis functions, universal approximation, and orthogonal least-squares learning , 1992, IEEE Trans. Neural Networks.

[36]  Steffen Udluft,et al.  Ensembles of Neural Networks for Robust Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[37]  Martin A. Riedmiller Neural reinforcement learning to swing-up and balance a real pole , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[38]  Ebrahim H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Hum. Comput. Stud..

[39]  Hsuan-Ming Feng,et al.  Particle swarm optimization learning fuzzy systems design , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[40]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[41]  Anton Maximilian Schäfer,et al.  Reinforcement learning with recurrent neural networks , 2008 .

[42]  Rogelio Lozano,et al.  Non-linear Control for Underactuated Mechanical Systems , 2001 .

[43]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[44]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[45]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[46]  Hsuan-Ming Feng,et al.  SELF-GENERATION FUZZY MODELING SYSTEMS THROUGH HIERARCHICAL RECURSIVE-BASED PARTICLE SWARM OPTIMIZATION , 2005, Cybern. Syst..

[47]  Thomas Martinetz,et al.  Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments , 2007, ESANN.

[48]  Friedhelm Schwenker,et al.  Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.

[49]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[50]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[51]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.