Interpretable Control by Reinforcement Learning

In this paper, three recently introduced reinforcement learning (RL) methods are used to generate human-interpretable policies for the cart-pole balancing benchmark. The novel RL methods learn human-interpretable policies in the form of compact fuzzy controllers and simple algebraic equations. The representations as well as the achieved control performances are compared with two classical controller design methods and three non-interpretable RL methods. All eight methods utilize the same previously generated data batch and produce their controller offline - without interaction with the real benchmark dynamics. The experiments show that the novel RL methods are able to automatically generate well-performing policies which are at the same time human-interpretable. Furthermore, one of the methods is applied to automatically learn an equation-based policy for a hardware cart-pole demonstrator by using only human-player-generated batch data. The solution generated in the first attempt already represents a successful balancing policy, which demonstrates the methods applicability to real-world problems.

[1]  Thomas A. Runkler,et al.  Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies , 2016, Eng. Appl. Artif. Intell..

[2]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[3]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[4]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5]  John R. Koza,et al.  Human-competitive results produced by genetic programming , 2010, Genetic Programming and Evolvable Machines.

[6]  L. Desborough,et al.  Increasing Customer Value of Industrial Control Performance Monitoring—Honeywell’s Experience , 2002 .

[7]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[8]  A. A. Jafari,et al.  Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes , 1998, Proceedings of Thirtieth Southeastern Symposium on System Theory.

[9]  C. Gearhart Genetic Programming as Policy Search in Markov Decision Processes , 2003 .

[10]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[11]  Jiaqiao Hu,et al.  Population-Based Evolutionary Approaches , 2013 .

[12]  Chia-Feng Juang,et al.  A hybrid of genetic algorithm and particle swarm optimization for recurrent network design , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Jia-Jun Wang,et al.  Simulation studies of inverted pendulum based on PID controllers , 2011, Simul. Model. Pract. Theory.

[14]  Hervé Luga,et al.  Evolving simple programs for playing atari games , 2018, GECCO.

[15]  Lothar Thiele,et al.  A Mathematical Analysis of Tournament Selection , 1995, ICGA.

[16]  Thomas A. Runkler,et al.  Generating interpretable fuzzy controllers using particle swarm optimization and genetic programming , 2018, GECCO.

[17]  Alois Knoll,et al.  Learning Throttle Valve Control Using Policy Search , 2013, ECML/PKDD.

[18]  Thomas A. Runkler,et al.  Interpretable Policies for Reinforcement Learning by Genetic Programming , 2017, Eng. Appl. Artif. Intell..

[19]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[20]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[21]  Yoshiji Fujimoto,et al.  Generating Equations with Genetic Programming for Control of a Movable Inverted Pendulum , 1998, SEAL.

[22]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[23]  Daniel Hein,et al.  Interpretable Reinforcement Learning Policies by Evolutionary Computation , 2019 .

[24]  Thomas A. Runkler,et al.  Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces , 2016, Int. J. Swarm Intell. Res..

[25]  Hitoshi Iba,et al.  Adaptation technique for integrating genetic programming and reinforcement learning for real robots , 2005, IEEE Transactions on Evolutionary Computation.

[26]  Ebrahim H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Hum. Comput. Stud..

[27]  John R. Koza,et al.  Automatic synthesis using genetic programming of an improved general-purpose controller for industrially representative plants , 2002, Proceedings 2002 NASA/DoD Conference on Evolvable Hardware.

[28]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[29]  Yu Li,et al.  Particle swarm optimisation for evolving artificial neural network , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[30]  J. G. Ziegler,et al.  Optimum Settings for Automatic Controllers , 1942, Journal of Fluids Engineering.

[31]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[32]  L X Wang,et al.  Fuzzy basis functions, universal approximation, and orthogonal least-squares learning , 1992, IEEE Trans. Neural Networks.

[33]  K. Downing Adaptive genetic programs via reinforcement learning , 2001 .