Dynamic tuning of PI-controllers based on model-free Reinforcement Learning methods

A Reinforcement Learning (RL) method called SARSA is used to dynamically tune a PI-controller for a Continuous Stirred Tank Heater (CSTH) experimental setup. In order to start from an acceptable policy, the proposed approach uses an approximate First Order Plus Time Delay (FOPTD) model to train the RL agent in the simulation environment before implementation on the real plant. As a result of the existing plant-model mismatch, the performance of the RL-based PI-controller based on the policy derived from simulations is not as good as the simulation results; however, training on the real plant results in a significant performance improvement. On the other hand, the IMC-tuned PI-controllers, which are the most commonly used feedback controllers, degrade because of the inevitable plant-model mismatch. The experimental tests are carried out for the cases of set-point tracking and disturbance rejection. In both cases, the successful adaptability of the RL-based PI-controller is clearly evident.