Suboptimal coverings for continuous spaces of control tasks

We propose the α-suboptimal covering number to characterize multi-task control problems where the set of dynamical systems and/or cost functions is infinite, analogous to the cardinality of finite task sets. This notion may help quantify the function class expressiveness needed to represent a good multi-task policy, which is important for learning-based control methods that use parameterized function approximation. We study suboptimal covering numbers for linear dynamical systems with quadratic cost (LQR problems) and construct a class of multi-task LQR problems amenable to analysis. For the scalar case, we show logarithmic dependence on the “breadth” of the space. For the matrix case, we present experiments 1) measuring the efficiency of a particular constructive cover, and 2) visualizing the behavior of two candidate systems for the lower bound. 1

[1]  Wen Tan,et al.  Operating point selection in multimodel controller design , 2004, Proceedings of the 2004 American Control Conference.

[2]  Antonio M. Pascoal,et al.  Issues, progress and new results in robust adaptive control , 2006 .

[3]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[4]  Roderick Murray-Smith,et al.  Multiple Model Approaches to Modelling and Control , 1997 .

[5]  B. Barmish,et al.  Adaptive stabilization of linear systems via switching control , 1986, 1986 25th IEEE Conference on Decision and Control.

[6]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[7]  Wilson J. Rugh,et al.  Interpolation of observer state feedback controllers for gain scheduling , 1999, IEEE Trans. Autom. Control..

[8]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10]  R. E. Kalman,et al.  Contributions to the Theory of Optimal Control , 1960 .

[11]  Minyue Fu,et al.  Minimum switching control for adaptive tracking , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[12]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[14]  Hussein A. Abbass,et al.  Multi-Task Deep Reinforcement Learning for Continuous Action Control , 2017, IJCAI.

[15]  B. Anderson,et al.  Multiple model adaptive control. Part 1: Finite controller coverings , 2000 .

[16]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[17]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[18]  Myung-Gon Yoon,et al.  Gain-Scheduling of Minimax Optimal State-Feedback Controllers for Uncertain LPV Systems , 2007, IEEE Transactions on Automatic Control.

[19]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.

[20]  G. Dullerud,et al.  A Course in Robust Control Theory: A Convex Approach , 2005 .

[21]  Jingjing Du,et al.  Multimodel Control of Nonlinear Systems: An Integrated Design Procedure Based on Gap Metric and H∞ Loop Shaping , 2012 .

[22]  D. McFarlane,et al.  Optimal guaranteed cost control and filtering for uncertain linear systems , 1994, IEEE Trans. Autom. Control..

[23]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[24]  M. Sami Fadali,et al.  Selecting operating points for discrete-time gain scheduling , 2003, Comput. Electr. Eng..

[25]  Mehran Mesbahi,et al.  On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case , 2019, ArXiv.

[26]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[27]  Ali Akbar Jalali,et al.  An optimal multiple-model strategy to design a controller for nonlinear processes: A boiler-turbine unit , 2012, Comput. Chem. Eng..

[28]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.