Applying expectation-maximization evaluation on approximate optimal control

In this paper we proposed an approach of approximating optimal tracking via expectation-maximization (EM) evaluation. From the discussion of applying reinforcement learning (RL) for a system with unknown internal dynamics, we present the challenge of using a classical frame of Q-learning on a tracking task. Further we explained the idea of redefining the cost function (i.e. criterion) of Q-learning to satisfy the requirement for the system dynamic knowledge for the tracking task. We explained the advantages of dividing the original trajectory tracking task into two machine learning subtasks (i.e. learning the quadratic regulator and learning the baseline command generator) on-line. Details are given on the integration of the Q-learning frame and EM algorithm as well as the convergence to the optimum control via iterative estimation of an optimal regulator and a baseline generator. Initial simulation results of this approach using a second order system showed the ability of the Q-learning frame integrated with the EM algorithm approximates to the optimal tracking task.