WeNet: Weighted Networks for Recurrent Network Architecture Search

In recent years, there has been increasing demand for automatic architecture search in deep learning. Numerous approaches have been proposed and led to state-of-the-art results in various applications, including image classification and language modeling. In this paper, we propose a novel way of architecture search by means of weighted networks (WeNet), which consist of a number of networks, with each assigned a weight. These weights are updated with back-propagation to reflect the importance of different networks. Such weighted networks bear similarity to mixture of experts. We conduct experiments on Penn Treebank and WikiText-2. We show that the proposed WeNet can find recurrent architectures which result in state-of-the-art performance.

[1]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[2]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[5]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[6]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[7]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[8]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[9]  Kaiming He,et al.  Deep Residual Learning for Image Recognition Supplementary Materials , 2016 .

[10]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[11]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[12]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[14]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[15]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[16]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[17]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[18]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[19]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[20]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[21]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Steve Renals,et al.  Dynamic Evaluation of Neural Sequence Models , 2017, ICML.

[23]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[24]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[25]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[26]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[31]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[32]  Jun Wang,et al.  Reinforcement Learning for Architecture Search by Network Transformation , 2017, ArXiv.

[33]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[34]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[37]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[38]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.