Discontinuity-Sensitive Optimal Control Learning by Mixture of Experts

This paper proposes a machine learning method to predict the solutions of related nonlinear optimal control problems given some parametric input, such as the initial state. The map between problem parameters to optimal solutions is called the problem-optimum map, and is often discontinuous due to nonconvexity, discrete homotopy classes, and control switching. This causes difficulties for traditional function approximators such as neural networks, which assume continuity of the underlying function. This paper proposes a mixture of experts (MoE) model composed of a classifier and several regressors, where each regressor is tuned to a particular continuous region. A novel training approach is proposed that trains classifier and regressors independently. MoE greatly outperforms standard neural networks, and achieves highly reliable trajectory prediction (over 99.5% accuracy) in several dynamic vehicle control problems.

[1]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[2]  Anthony V. Fiacco,et al.  Introduction to Sensitivity and Stability Analysis in Nonlinear Programming , 2012 .

[3]  J. Betts Survey of Numerical Methods for Trajectory Optimization , 1998 .

[4]  Kris Hauser,et al.  A data-driven indirect method for nonlinear optimal control , 2017, Astrodynamics.

[5]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[6]  Reza Ebrahimpour,et al.  Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.

[7]  Vijay Kumar,et al.  Trajectory generation and control for precise aggressive maneuvers with quadrotors , 2012, Int. J. Robotics Res..

[8]  Kris Hauser,et al.  Learning Trajectories for Real- Time Optimal Control of Quadrotors , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Marc Toussaint,et al.  Fast motion planning from experience: trajectory prediction for speeding up movement generation , 2013, Auton. Robots.

[10]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[11]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[12]  Sami Haddadin,et al.  Learning quadrotor maneuvers from optimal control and generalizing in real-time , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Gerd Hirzinger,et al.  Trajectory planning for optimal robot catching in real-time , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[16]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[17]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[18]  Kris Hauser Learning the Problem-Optimum Map: Analysis and Application to Global Optimization in Robotics , 2017, IEEE Transactions on Robotics.

[19]  Alberto Bemporad,et al.  The explicit solution of model predictive control via multiparametric quadratic programming , 2000, Proceedings of the 2000 American Control Conference. ACC (IEEE Cat. No.00CH36334).

[20]  H. Maurer,et al.  Sensitivity Analysis and Real-Time Control of Parametric Optimal Control Problems Using Boundary Value Methods , 2001 .

[21]  Malcolm I. Heywood,et al.  Input partitioning to mixture of experts , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[22]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[23]  Hexi Baoyin,et al.  Practical Techniques for Low-Thrust Trajectory Optimization with Homotopic Approach , 2012 .

[24]  Ahmad A. Masoud,et al.  Kinodynamic Motion Planning , 2010, IEEE Robotics & Automation Magazine.