Efficient robust policy optimization

We provide efficient algorithms to calculate first and second order gradients of the cost of a control law with respect to its parameters, to speed up policy optimization. We achieve robustness by simultaneously designing one control law for multiple models with potentially different model structures, which represent model uncertainty and unmodeled dynamics. Providing explicit examples of possible unmodeled dynamics during the control design process is easier for the designer and is more effective than providing simulated perturbations to increase robustness, as is currently done in machine learning. Our approach supports the design of deterministic nonlinear and time varying controllers for both deterministic and stochastic nonlinear and time varying systems, including policies with internal state such as observers or other state estimators. We highlight the benefit of control laws made up of collections of simple policies where only one component policy is active at a time. Controller optimization and learning is particularly fast and effective in this situation because derivatives are decoupled.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  M. Ciletti,et al.  The computation and theory of optimal control , 1972 .

[3]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[4]  Douglas C. Montgomery,et al.  Using common random numbers in simulation experiments — an approach to statistical analysis , 1976 .

[5]  Ronald J. Williams,et al.  Adaptive state representation and estimation using recurrent connectionist networks , 1990 .

[6]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[7]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[8]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[9]  A. Varga,et al.  Optimal output feedback control: a multi-model approach , 1996, Proceedings of Joint Conference on Control Applications Intelligent Control and Computer Aided Control System Design.

[10]  Lee A. Feldkamp,et al.  Fixed-weight controller for multiple systems , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[11]  Tariq Samad,et al.  Neuro-Control Design: Optimization Aspects , 1997 .

[12]  R. Longchamp,et al.  A Minimax Approach for Multi-Objective Controller Design Using Multiple Models , 1999 .

[13]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[14]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[15]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[16]  Kurt E. Häggblom,et al.  Application of robust and multimodel control methods to an ill-conditioned distillation column , 2002 .

[17]  Jennie Si,et al.  Backpropagation Through Time and Derivative Adaptive CriticsA Common Framework for ComparisonPortions of this chapter were previously published in [4, 7,9, 1214,23]. , 2004 .

[18]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[19]  P. Werbos Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities , 2006 .

[20]  Danil V. Prokhorov Training Recurrent Neurocontrollers for Robustness With Derivative-Free Kalman Filter , 2006, IEEE Transactions on Neural Networks.

[21]  Matthew McNaughton,et al.  CASTRO: robust nonlinear trajectory optimization using multiple models , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  A. Poznyak,et al.  Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain Systems , 2007, 2007 American Control Conference.

[23]  J. Shinar,et al.  Solution of a Linear Pursuit-Evasion Game with Variable Structure and Uncertain Dynamics , 2007 .

[24]  Frank L. Lewis,et al.  Guest Editorial: Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[25]  Christopher G. Atkeson,et al.  Random Sampling of States in Dynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[27]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .

[28]  A. Poznyak,et al.  The dynamic programming approach to multi-model robust optimization , 2010 .

[29]  Christopher G. Atkeson,et al.  Physical human interaction for an inflatable manipulator , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[30]  Frank L. Lewis,et al.  Special issue on approximate dynamic programming and reinforcement learning , 2011 .

[31]  C. Atkeson Efficient Robust Policy Optimization ( Long Version ) , 2012 .