Learning Architectures from an Extended Search Space for Language Modeling

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.

[1]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[2]  Phil Blunsom,et al.  Mogrifier LSTM , 2020, ICLR.

[3]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[4]  Fabio A. González,et al.  Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media , 2018, NAACL.

[5]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[6]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[11]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[12]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[13]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[14]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[15]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[16]  Frank Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[17]  Yue Zhang,et al.  NCRF++: An Open-source Neural Sequence Labeling Toolkit , 2018, ACL.

[18]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Peter Dayan,et al.  Fast Parametric Learning with Activation Memorization , 2018, ICML.

[21]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[22]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[23]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jingbo Zhu,et al.  Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[25]  Jean-David Ruvini,et al.  Learning Better Internal Structure of Words for Sequence Labeling , 2018, EMNLP.

[26]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[27]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[28]  Fandong Meng,et al.  GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling , 2019, ACL.

[29]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[30]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[31]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Luke S. Zettlemoyer,et al.  Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.

[33]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[34]  Richard Socher,et al.  Revisiting Activation Regularization for Language RNNs , 2017, ArXiv.

[35]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[36]  Di He,et al.  Dense Information Flow for Neural Machine Translation , 2018, NAACL.

[37]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[38]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[39]  Richard Socher,et al.  Quasi-Recurrent Neural Networks , 2016, ICLR.

[40]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Jingbo Zhu,et al.  Multi-layer Representation Fusion for Neural Machine Translation , 2018, COLING.

[42]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[43]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[44]  Jingbo Zhu,et al.  Improved Differentiable Architecture Search for Language Modeling and Named Entity Recognition , 2019, EMNLP.