ES-ENAS: BLACKBOX OPTIMIZATION

We introduce ES-ENAS, a simple yet general evolutionary joint optimization procedure by combining continuous optimization via Evolutionary Strategies (ES) [38, 29] and combinatorial optimization via Efficient NAS (ENAS) [50, 34, 54] in a highly scalable and intuitive way. Our main insight is noticing that ES is already a highly distributed algorithm involving hundreds of forward passes which can not only be used for training neural network weights, but also for jointly training a NAS controller, both in a blackbox fashion. By doing so, we also bridge the gap from NAS research in supervised learning settings to the reinforcement learning scenario through this relatively simple marriage between two different yet common lines of research. We demonstrate the utility and effectiveness of our method over a large search space by training highly combinatorial neural network architectures for RL problems in continuous control, via edge pruning and quantization. We also incorporate a wide variety of popular techniques from modern NAS literature including multiobjective optimization along with various controller methods, to showcase their promise in the RL field and discuss possible extensions.

[1]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[2]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[3]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[4]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[5]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[6]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[7]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[10]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[11]  Verena Heidrich-Meisner,et al.  Neuroevolution strategies for episodic reinforcement learning , 2009, J. Algorithms.

[12]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[13]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[14]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[15]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[17]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[18]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[19]  Christopher D. Manning,et al.  Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.

[20]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[21]  Oswin Krause,et al.  CMA-ES with Optimal Covariance Update and Storage Complexity , 2016, NIPS.

[22]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[23]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[24]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[25]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[26]  A. Shamsai,et al.  Multi-objective Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[27]  Anne Auger,et al.  A Comparative Study of Large-Scale Variants of CMA-ES , 2018, PPSN.

[28]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[29]  Richard E. Turner,et al.  Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[30]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[31]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[32]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[33]  Geoffrey E. Hinton,et al.  Learning Sparse Networks Using Targeted Dropout , 2019, ArXiv.

[34]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[35]  Tom Schaul,et al.  Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods , 2019, ArXiv.

[36]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[37]  Adam Gaier,et al.  Weight Agnostic Neural Networks , 2019, NeurIPS.

[38]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[39]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[41]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[42]  Oswin Krause,et al.  Large-scale noise-resilient evolution-strategies , 2019, GECCO.

[43]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[44]  Quoc V. Le,et al.  AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.

[45]  Elad Eban,et al.  Structured Multi-Hashing for Model Compression , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Xingyou Song,et al.  Observational Overfitting in Reinforcement Learning , 2019, ICLR.

[47]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[48]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[49]  Hanxiao Liu,et al.  PyGlove: Symbolic Programming for Automated Machine Learning , 2021, NeurIPS.

[50]  Evolving Reinforcement Learning Algorithms , 2021, ICLR.