论文信息 - ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning

ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning

We introduce ES-ENAS, a simple neural architecture search (NAS) algorithm for the purpose of reinforcement learning (RL) policy design, by combining Evolutionary Strategies (ES) [30, 40] and Efficient NAS (ENAS) [36, 54, 61] in a highly scalable and intuitive way. Our main insight is noticing that ES is already a distributed blackbox algorithm, and thus we may simply insert a model controller from ENAS into the central aggregator in ES and obtain weight sharing properties for free. By doing so, we bridge the gap from NAS research in supervised learning settings to the reinforcement learning scenario through this relatively simple marriage between two different lines of research, and are one of the first to apply controller-based NAS techniques to RL.We demonstrate the utility of our method by training combinatorial neural network architectures for RL problems in continuous control, via edge pruning and weight sharing. We also incorporate a wide variety of popular techniques from modern NAS literature, including multiobjective optimization and varying controller methods, to showcase their promise in the RL field and discuss possible extensions. We achieve > 90% network compression for multiple tasks, which may be special interest in mobile robotics [15] with limited storage and computational resources.

[1] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[2] Hanxiao Liu,et al. PyGlove: Symbolic Programming for Automated Machine Learning , 2021, NeurIPS.

[3] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.

[4] S. Hewitt,et al. 1987 , 1987, Literatur in der SBZ/DDR.

[5] V. Rich. Personal communication , 1989, Nature.

[6] Wenbo Gao,et al. ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[7] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[8] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[9] Chen Liang,et al. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.

[10] Aaron Klein,et al. NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[11] Kalyanmoy Deb,et al. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[12] Peter I. Frazier,et al. A Tutorial on Bayesian Optimization , 2018, ArXiv.

[13] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[14] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[15] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[16] J. Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[17] Tom Schaul,et al. Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods , 2019, ArXiv.

[18] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[19] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[20] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[21] Kaiming He,et al. Exploring Randomly Wired Neural Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23] Tobias Glasmachers,et al. Challenges in High-dimensional Reinforcement Learning with Evolution Strategies , 2018, PPSN.

[24] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.

[25] Richard E. Turner,et al. Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[26] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[27] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Sehoon Ha,et al. Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[29] Rainer Storn,et al. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[30] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[31] Xingyou Song,et al. Observational Overfitting in Reinforcement Learning , 2019, ICLR.

[32] Yves Chauvin,et al. A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[33] Robert D. Nowak,et al. Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[34] Julian Togelius,et al. Playing Atari with Six Neurons , 2018, AAMAS.

[35] Michael C. Mozer,et al. Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[36] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[37] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[38] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[39] Atil Iscen,et al. Provably Robust Blackbox Optimization for Reinforcement Learning , 2019, CoRL.

[40] Adam Gaier,et al. Weight Agnostic Neural Networks , 2019, NeurIPS.

[41] D. Sculley,et al. Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[42] A. Shamsai,et al. Multi-objective Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[43] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[44] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[45] Navdeep Jaitly,et al. Robotic Table Tennis with Model-Free Reinforcement Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46] Willie Neiswanger,et al. BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search , 2021, AAAI.

[47] Chelsea Finn,et al. Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[49] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[50] Elad Eban,et al. Structured Multi-Hashing for Model Compression , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Geoffrey E. Hinton,et al. Learning Sparse Networks Using Targeted Dropout , 2019, ArXiv.

[52] Christopher D. Manning,et al. Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.

[53] Atil Iscen,et al. Policies Modulating Trajectory Generators , 2018, CoRL.

[54] Erich Elsen,et al. Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[55] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[56] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[57] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[58] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[59] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[60] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..