论文信息 - An Idea of Using Reinforcement Learning in Adaptive Control Systems

An Idea of Using Reinforcement Learning in Adaptive Control Systems

This paper concerns the concept of using a Reinforcement Learning approach for PID adaptive auto tuning, which may greatly increase possible application of the controller. The proposed idea can be utilised for off-line and on-line control systems. Though it becomes really powerful in the practical on-line applications, where the environment is unknown and all information is gained only by interactions. In the paper the algorithm based on Q-learning is proposed. The properties of the algorithm are illustrated on a simple examples.

[1] A. Barto,et al. Learning and Sequential Decision Making , 1989 .

[2] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[3] L. Koszalka. A concept of adaptive control system for experimentation and controlling described by relation systems , 1994 .

[4] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.

[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[6] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..

[7] R. Sutton,et al. Connectionist Learning for Control: An Overview , 1989 .

[8] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[9] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[10] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11] Alon Orlitsky,et al. On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines , 2003, J. Mach. Learn. Res..

[12] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[13] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[15] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .

[16] Long-Ji Lin,et al. Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[17] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[18] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[20] Jerry M. Mendel,et al. Reinforcement-learning control and pattern recognition systems , 1994 .

[21] James A. Hendler,et al. Planning in Uncertain, Unpredictable or Changing Environments , 1990 .

[22] Gene F. Franklin,et al. Feedback Control of Dynamic Systems , 1986 .

[23] Richard S. Sutton,et al. Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[24] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[25] P. Kumar,et al. Optimal adaptive controllers for unknown Markov chains , 1982 .

[26] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[27] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[28] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[29] Katsuhiko Ogata,et al. Modern Control Engineering , 1970 .

[30] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.

[31] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[32] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[33] Virgil W. Eveleigh,et al. Adaptive Control And Optimization Techniques , 1967 .

[34] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .

[35] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[36] V. Borkar,et al. Adaptive control of Markov chains, I: Finite parameter set , 1979, 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[37] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[38] Jean-Arcady Meyer,et al. Self-improving Reactive Agents: Case Studies of Reinforcement Learning Frameworks , 1991 .

[39] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[40] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .

[41] Andrew G. Barto,et al. Connectionist learning for control: an overview , 1990 .

[42] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.

[43] Douglas C. Hittle,et al. Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil , 1997, Artificial Intelligence in Engineering.

[44] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[45] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[46] John J. Grefenstette,et al. Learning Sequential Decision Rules Using Simulation Models and Competition , 1990, Machine Learning.

[47] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..

[48] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[49] Andrew G. Barto,et al. Reinforcement Learning and Dynamic Programming , 1995 .

[50] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[51] Richard Wheeler,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.