Derivation of integrated state equation for combined outputs-inputs vector of discrete-time linear time-invariant system and its application to reinforcement learning

For a discrete-time linear time-invariant partially observable system that satisfies the well-known algebraic condition of observability, we derive a fully observable system equation using an augmented state vector combining output and input sequences. The derived representation enables application of reinforcement learning methods for fully observable linear quadratic regulation problems to partially observable ones avoiding the problems invoked in existing methods.

[1]  Thomas Kailath,et al.  Linear Systems , 1980 .

[2]  Kang-Zhi Liu,et al.  A New Optimal Digital Output Feedback Control and Its Application to the Control of Mechanical Systems , 1986 .

[3]  Shigeyasu Kawaji On the Digital Control using Dead-beat Observers , 1988 .

[4]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[5]  Steven J. Bradtke,et al.  Incremental dynamic programming for on-line adaptive optimal control , 1995 .

[6]  J. Doyle,et al.  Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.

[7]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[8]  Jun Morimoto,et al.  Reinforcement Learning State Estimator , 2007, Neural Computation.

[9]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Kunihisa Okano,et al.  Stabilization of uncertain systems with finite data rates and Markovian packet losses , 2013, 2013 European Control Conference (ECC).

[11]  Hirokazu Anai,et al.  Policy gradient reinforcement learning method for discrete-time linear quadratic regulation problem using estimated state value function , 2017, 2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE).