论文信息 - Hyperparameter optimization with REINFORCE and Transformers

Hyperparameter optimization with REINFORCE and Transformers

Reinforcement Learning has yielded promising results for Neural Architecture Search (NAS). In this paper, we demonstrate how its performance can be improved by using a simplified Transformer block to model the policy network. The simplified Transformer uses a 2-stream attention-based mechanism to model hyper-parameter dependencies while avoiding layer normalization and position encoding. We posit that this parsimonious design balances model complexity against expressiveness, making it suitable for discovering optimal architectures in high-dimensional search spaces with limited exploration budgets. We demonstrate how the algorithm's performance can be further improved by a) using an actor-critic style algorithm instead of plain vanilla policy gradient and b) ensembling Transformer blocks with shared parameters, each block conditioned on a different auto-regressive factorization order. Our algorithm works well as both a NAS and generic hyper-parameter optimization (HPO) algorithm: it outperformed most algorithms on NAS-Bench-101, a public data-set for benchmarking NAS algorithms. In particular, it outperformed RL based methods that use alternate architectures to model the policy network, underlining the value of using attention-based networks in this setting. As a generic HPO algorithm, it outperformed Random Search in discovering more accurate multi-layer perceptron model architectures across 2 regression tasks. We have adhered to guidelines listed in Lindauer and Hutter while designing experiments and reporting results.

[1] Qi Tian,et al. Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Ameet Talwalkar,et al. Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[3] Kirthevasan Kandasamy,et al. Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[4] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[5] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[6] Marius Lindauer,et al. Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[7] Davide Anguita,et al. Machine learning approaches for improving condition-based maintenance of naval propulsion plants , 2016 .

[8] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9] Yiyang Zhao,et al. AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search , 2019, ArXiv.

[10] Hugo Larochelle,et al. Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[11] Aaron Klein,et al. BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[12] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.

[13] Steffen Bickel,et al. Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[14] Oriol Vinyals,et al. Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[15] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[16] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[18] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[19] Tie-Yan Liu,et al. Neural Architecture Optimization , 2018, NeurIPS.

[20] Alan L. Yuille,et al. Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[22] Pieter Abbeel,et al. PixelSNAIL: An Improved Autoregressive Generative Model , 2017, ICML.

[23] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[25] Thomas Brox,et al. Understanding and Robustifying Differentiable Architecture Search , 2020, ICLR.

[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[27] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[30] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[31] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[32] Jure Leskovec,et al. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[33] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[34] Colin White,et al. BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search , 2019, AAAI.

[35] Mario Lucic,et al. Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[36] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[37] Chris Eliasmith,et al. Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[38] Wei Wu,et al. Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[40] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[41] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[42] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[43] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[45] Aaron Klein,et al. NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[46] Kevin Leyton-Brown,et al. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[47] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[48] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[49] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[50] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[51] Hugo Larochelle,et al. MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[52] Xiaopeng Zhang,et al. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2020, ICLR.

[53] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[54] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[55] Li Fei-Fei,et al. Progressive Neural Architecture Search , 2017, ECCV.