A Reinforcement Learning (RL) method called SARSA is used to dynamically tune a PI-controller for a Continuous Stirred Tank Heater (CSTH) experimental setup. In order to start from an acceptable policy, the proposed approach uses an approximate First Order Plus Time Delay (FOPTD) model to train the RL agent in the simulation environment before implementation on the real plant. As a result of the existing plant-model mismatch, the performance of the RL-based PI-controller based on the policy derived from simulations is not as good as the simulation results; however, training on the real plant results in a significant performance improvement. On the other hand, the IMC-tuned PI-controllers, which are the most commonly used feedback controllers, degrade because of the inevitable plant-model mismatch. The experimental tests are carried out for the cases of set-point tracking and disturbance rejection. In both cases, the successful adaptability of the RL-based PI-controller is clearly evident.
[1]
Mahesan Niranjan,et al.
On-line Q-learning using connectionist systems
,
1994
.
[2]
Jay H. Lee,et al.
Approximate dynamic programming based approach to process control and scheduling
,
2006,
Comput. Chem. Eng..
[3]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[4]
Tore Hägglund,et al.
Automatic Tuning of Pid Controllers
,
1988
.
[5]
Yang Li,et al.
Simulation verification of virtual equivalent system theory
,
2010,
ICCAS 2010.
[6]
Hussam Nosair,et al.
Min–max control using parametric approximate dynamic programming
,
2010
.
[7]
Taworn Benjanarasuth,et al.
Two-degree-of-freedom simple servo adaptive control for SCARA robot
,
2010,
ICCAS 2010.
[8]
W. Marsden.
I and J
,
2012
.