Cocktail: Learn a Better Neural Network Controller from Multiple Experts via Adaptive Mixing and Robust Distillation

Neural networks are being increasingly applied to control and decision making for learning-enabled cyber-physical systems (LE-CPSs). They have shown promising performance without requiring the development of complex physical models; however, their adoption is significantly hindered by the concerns on their safety, robustness, and efficiency. In this work, we propose COCKTAIL, a novel design framework that automatically learns a neural network based controller from multiple existing control methods (experts) that could be either model-based or neural network based. In particular, COCKTAIL first performs reinforcement learning to learn an optimal system-level adaptive mixing strategy that incorporates the underlying experts with dynamically-assigned weights, and then conducts a teacher-student distillation with probabilistic adversarial training and regularization to synthesize a student neural network controller with improved control robustness (measured by a safe control rate metric with respect to adversarial attacks or measurement noises), control energy efficiency, and verifiability (measured by the computation time for verification). Experiments on three non-linear systems demonstrate significant advantages of our approach on these properties over various baseline methods.

[1]  Qi Zhu,et al.  Energy-Efficient Control Adaptation with Safety Guarantees for Learning-Enabled Cyber-Physical Systems , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[2]  Zheng O'Neill,et al.  One for Many: Transfer Learning for Building HVAC Control , 2020, BuildSys@SenSys.

[3]  Qi Zhu,et al.  Opportunistic Intermittent Control with Safety Guarantees for Autonomous Systems , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[4]  Frank Allgöwer,et al.  Training Robust Neural Networks Using Lipschitz Bounds , 2020, IEEE Control Systems Letters.

[5]  Jia Pan,et al.  Rigid-Soft Interactive Learning for Robust Grasping , 2020, IEEE Robotics and Automation Letters.

[6]  Dynamic-weighted simplex strategy for learning enabled cyber physical systems , 2019, J. Syst. Archit..

[7]  Jiameng Fan,et al.  ReachNN , 2019, ACM Trans. Embed. Comput. Syst..

[8]  Insup Lee,et al.  Verisig: verifying safety properties of hybrid systems with neural network controllers , 2018, HSCC.

[9]  Qi Cai,et al.  Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy , 2019, NeurIPS.

[10]  Bai Xue,et al.  Robust Invariant Sets Computation for Switched Discrete-Time Polynomial Systems , 2018, ArXiv.

[11]  Xiaowei Huang,et al.  Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[12]  Bhuvana Ramabhadran,et al.  Efficient Knowledge Distillation from an Ensemble of Teachers , 2017, INTERSPEECH.

[13]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[14]  Sriram Sankaranarayanan,et al.  A Linear Programming-based Iterative Approach to Stabilizing Polynomial Dynamics , 2017 .

[15]  Yevgen Chebotar,et al.  Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition , 2016, INTERSPEECH.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[19]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[20]  Xin Chen,et al.  Flow*: An Analyzer for Non-linear Hybrid Systems , 2013, CAV.

[21]  D.C. Rye,et al.  A heuristic rule-based switching and adaptive PID controller for a large autonomous tracked vehicle: from development to implementation , 2004, Proceedings of the 2004 IEEE International Conference on Control Applications, 2004..

[22]  S. Joe Qin,et al.  A survey of industrial model predictive control technology , 2003 .

[23]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[24]  Lui Sha,et al.  The Simplex architecture for safe online control system upgrades , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).