An Idea of Using Reinforcement Learning in Adaptive Control Systems

This paper concerns the concept of using a Reinforcement Learning approach for PID adaptive auto tuning, which may greatly increase possible application of the controller. The proposed idea can be utilised for off-line and on-line control systems. Though it becomes really powerful in the practical on-line applications, where the environment is unknown and all information is gained only by interactions. In the paper the algorithm based on Q-learning is proposed. The properties of the algorithm are illustrated on a simple examples.

[1]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[2]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[3]  L. Koszalka A concept of adaptive control system for experimentation and controlling described by relation systems , 1994 .

[4]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  MITSUO SATO,et al.  Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..

[7]  R. Sutton,et al.  Connectionist Learning for Control: An Overview , 1989 .

[8]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[9]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11]  Alon Orlitsky,et al.  On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines , 2003, J. Mach. Learn. Res..

[12]  Shie Mannor,et al.  A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[15]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[16]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[17]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[18]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[20]  Jerry M. Mendel,et al.  Reinforcement-learning control and pattern recognition systems , 1994 .

[21]  James A. Hendler,et al.  Planning in Uncertain, Unpredictable or Changing Environments , 1990 .

[22]  Gene F. Franklin,et al.  Feedback Control of Dynamic Systems , 1986 .

[23]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[24]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[25]  P. Kumar,et al.  Optimal adaptive controllers for unknown Markov chains , 1982 .

[26]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[27]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[28]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  Katsuhiko Ogata,et al.  Modern Control Engineering , 1970 .

[30]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[31]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[32]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[33]  Virgil W. Eveleigh,et al.  Adaptive Control And Optimization Techniques , 1967 .

[34]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[35]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[36]  V. Borkar,et al.  Adaptive control of Markov chains, I: Finite parameter set , 1979, 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[37]  A. Jalali,et al.  Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[38]  Jean-Arcady Meyer,et al.  Self-improving Reactive Agents: Case Studies of Reinforcement Learning Frameworks , 1991 .

[39]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[40]  L. Baird,et al.  A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .

[41]  Andrew G. Barto,et al.  Connectionist learning for control: an overview , 1990 .

[42]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[43]  Douglas C. Hittle,et al.  Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil , 1997, Artificial Intelligence in Engineering.

[44]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[45]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[46]  John J. Grefenstette,et al.  Learning Sequential Decision Rules Using Simulation Models and Competition , 1990, Machine Learning.

[47]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[48]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[49]  Andrew G. Barto,et al.  Reinforcement Learning and Dynamic Programming , 1995 .

[50]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[51]  Richard Wheeler,et al.  Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.