Learning the optimal state-feedback using deep networks

We investigate the use of deep artificial neural networks to approximate the optimal state-feedback control of continuous time, deterministic, non-linear systems. The networks are trained in a supervised manner using trajectories generated by solving the optimal control problem via the Hermite-Simpson transcription method. We find that deep networks are able to represent the optimal state-feedback with high accuracy and precision well outside the training area. We consider non-linear dynamical models under different cost functions that result in both smooth and discontinuous (bang-bang) optimal control solutions. In particular, we investigate the inverted pendulum swing-up and stabilization, a multicopter pin-point landing and a spacecraft free landing problem. Across all domains, we find that deep networks significantly outperform shallow networks in the ability to build an accurate functional representation of the optimal control. In the case of spacecraft and multicopter landing, deep networks are able to achieve safe landings consistently even when starting well outside of the training area.

[1]  Sergey Levine,et al.  Exploring Deep and Recurrent Architectures for Optimal Control , 2013, ArXiv.

[2]  John T. Betts,et al.  Practical Methods for Optimal Control and Estimation Using Nonlinear Programming , 2009 .

[3]  Raffaello D'Andrea,et al.  Performance benchmarking of quadrotor systems using time-optimal control , 2012, Auton. Robots.

[4]  D. Izzo,et al.  Nonlinear model predictive control applied to vision-based spacecraft landing , 2013 .

[5]  Sohrab Effati,et al.  Optimal control problem via neural networks , 2013, Neural Computing and Applications.

[6]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[7]  Farzad Pourboghrat,et al.  Optimal control of nonlinear systems using RBF neural network and adaptive extended Kalman filter , 2009, 2009 American Control Conference.

[8]  F. Lewis,et al.  A Hamilton-Jacobi setup for constrained neural network control , 2003, Proceedings of the 2003 IEEE International Symposium on Intelligent Control.

[9]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Sergey Levine,et al.  Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[12]  Wang Ding,et al.  Constrained online optimal control for continuous-time nonlinear systems using neuro-dynamic programming , 2014, Proceedings of the 33rd Chinese Control Conference.

[13]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[16]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[17]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[18]  Yuval Tassa,et al.  Least Squares Solutions of the HJB Equation With Neural Network Value-Function Approximators , 2007, IEEE Transactions on Neural Networks.

[19]  Behcet Acikmese,et al.  Convex programming approach to powered descent guidance for mars landing , 2007 .

[20]  M. Bardi,et al.  Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations , 1997 .

[21]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[22]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[23]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[24]  Konrad P. Körding,et al.  Deep networks for motor control functions , 2015, Front. Comput. Neurosci..

[25]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[26]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.