论文信息 - Shared Linear Quadratic Regulation Control: A Reinforcement Learning Approach

Shared Linear Quadratic Regulation Control: A Reinforcement Learning Approach

We propose controller synthesis for state regulation problems in which a human operator shares control with an autonomy system, running in parallel. The autonomy system continuously improves over human action, with minimal intervention, and can take over full-control if necessary. It additively combines user input with an adaptive optimal corrective signal to drive the plant. It is adaptive in the sense that it neither estimates nor requires a model of the human’s action policy, or the internal dynamics of the plant, and can adjust to changes in both. Our contribution is twofold; first, a new controller synthesis for shared control which we formulate as an adaptive optimal control problem for continuous-time linear systems and solve it online as a human-in-the-loop reinforcement learning. The result is an architecture that we call shared linear quadratic regulator (sLQR). Second, we provide new analysis of reinforcement learning for continuous-time linear systems in two parts. In the first analysis part, we avoid learning along a single state-space trajectory which we show leads to data collinearity under certain conditions. In doing so, we make a clear separation between exploitation of learned policies and exploration of the state-space, and propose an exploration scheme that requires switching to new state-space trajectories rather than injecting noise continuously while learning. This avoidance of continuous noise injection minimizes interference with human action, and avoids bias in the convergence to the stabilizing solution of the underlying algebraic Riccati equation. We show that exploring a minimum number of pairwise distinct state-space trajectories is necessary to avoid collinearity in the learning data. In the second analysis part, we show conditions under which existence and uniqueness of solutions can be established for off-policy reinforcement learning in continuous-time linear systems; namely, prior knowledge of the input matrix.

[1] Kevin Chen,et al. A shared control method for obstacle avoidance with mobile robots and its interaction with communication delay , 2017, Int. J. Robotics Res..

[2] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .

[3] Zhong-Ping Jiang,et al. Data-Driven Adaptive Optimal Control of Connected Vehicles , 2017, IEEE Transactions on Intelligent Transportation Systems.

[4] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[5] M. Athans,et al. On the optimal error regulation of a string of moving vehicles , 1966 .

[6] Anca D. Dragan,et al. Shared Autonomy via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[7] Joe Brewer,et al. Kronecker products and matrix calculus in system theory , 1978 .

[8] Frank L. Lewis,et al. Leader-to-Formation Stability of Multiagent Systems: An Adaptive Optimal Control Approach , 2018, IEEE Transactions on Automatic Control.

[9] Harald K. Wimmer. Roth's theorems for matrix equations with symmetry constraints , 1994 .

[10] Zhong-Ping Jiang,et al. Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems , 2016, IEEE Transactions on Automatic Control.

[11] Sterling J. Anderson,et al. Constraint-based planning and control for safe, semi-autonomous operation of vehicles , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[12] Siddhartha S. Srinivasa,et al. Shared Autonomy via Hindsight Optimization , 2015, Robotics: Science and Systems.

[13] Anuradha M. Annaswamy,et al. Shared Control Between Human and Adaptive Autopilots , 2018 .

[14] Javier Alonso-Mora,et al. Safe Nonlinear Trajectory Generation for Parallel Autonomy With a Dynamic Vehicle Model , 2018, IEEE Transactions on Intelligent Transportation Systems.

[15] Sterling J. Anderson,et al. An optimal-control-based framework for trajectory planning, threat assessment, and semi-autonomous control of passenger vehicles in hazard avoidance scenarios , 2010 .

[16] Siddhartha S. Srinivasa,et al. A policy-blending formalism for shared control , 2013, Int. J. Robotics Res..

[17] George G. Lendaris,et al. Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[18] Zhong-Ping Jiang,et al. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[19] Frank L. Lewis,et al. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..