论文信息 - Cocktail: Learn a Better Neural Network Controller from Multiple Experts via Adaptive Mixing and Robust Distillation

Cocktail: Learn a Better Neural Network Controller from Multiple Experts via Adaptive Mixing and Robust Distillation

Neural networks are being increasingly applied to control and decision making for learning-enabled cyber-physical systems (LE-CPSs). They have shown promising performance without requiring the development of complex physical models; however, their adoption is significantly hindered by the concerns on their safety, robustness, and efficiency. In this work, we propose COCKTAIL, a novel design framework that automatically learns a neural network based controller from multiple existing control methods (experts) that could be either model-based or neural network based. In particular, COCKTAIL first performs reinforcement learning to learn an optimal system-level adaptive mixing strategy that incorporates the underlying experts with dynamically-assigned weights, and then conducts a teacher-student distillation with probabilistic adversarial training and regularization to synthesize a student neural network controller with improved control robustness (measured by a safe control rate metric with respect to adversarial attacks or measurement noises), control energy efficiency, and verifiability (measured by the computation time for verification). Experiments on three non-linear systems demonstrate significant advantages of our approach on these properties over various baseline methods.

[1] Qi Zhu,et al. Energy-Efficient Control Adaptation with Safety Guarantees for Learning-Enabled Cyber-Physical Systems , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[2] Zheng O'Neill,et al. One for Many: Transfer Learning for Building HVAC Control , 2020, BuildSys@SenSys.

[3] Qi Zhu,et al. Opportunistic Intermittent Control with Safety Guarantees for Autonomous Systems , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[4] Frank Allgöwer,et al. Training Robust Neural Networks Using Lipschitz Bounds , 2020, IEEE Control Systems Letters.

[5] Jia Pan,et al. Rigid-Soft Interactive Learning for Robust Grasping , 2020, IEEE Robotics and Automation Letters.

[6] Dynamic-weighted simplex strategy for learning enabled cyber physical systems , 2019, J. Syst. Archit..

[7] Jiameng Fan,et al. ReachNN , 2019, ACM Trans. Embed. Comput. Syst..

[8] Insup Lee,et al. Verisig: verifying safety properties of hybrid systems with neural network controllers , 2018, HSCC.

[9] Qi Cai,et al. Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy , 2019, NeurIPS.

[10] Bai Xue,et al. Robust Invariant Sets Computation for Switched Discrete-Time Polynomial Systems , 2018, ArXiv.

[11] Xiaowei Huang,et al. Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[12] Bhuvana Ramabhadran,et al. Efficient Knowledge Distillation from an Ensemble of Teachers , 2017, INTERSPEECH.

[13] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[14] Sriram Sankaranarayanan,et al. A Linear Programming-based Iterative Approach to Stabilizing Polynomial Dynamics , 2017 .

[15] Yevgen Chebotar,et al. Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition , 2016, INTERSPEECH.

[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[17] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[19] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[20] Xin Chen,et al. Flow*: An Analyzer for Non-linear Hybrid Systems , 2013, CAV.

[21] D.C. Rye,et al. A heuristic rule-based switching and adaptive PID controller for a large autonomous tracked vehicle: from development to implementation , 2004, Proceedings of the 2004 IEEE International Conference on Control Applications, 2004..

[22] S. Joe Qin,et al. A survey of industrial model predictive control technology , 2003 .

[23] Alberto Bemporad,et al. The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[24] Lui Sha,et al. The Simplex architecture for safe online control system upgrades , 1998, Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No.98CH36207).