论文信息 - Learning Architectures from an Extended Search Space for Language Modeling

Learning Architectures from an Extended Search Space for Language Modeling

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.

[1] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[2] Phil Blunsom,et al. Mogrifier LSTM , 2020, ICLR.

[3] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[4] Fabio A. González,et al. Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media , 2018, NAACL.

[5] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[6] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[10] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[11] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.

[12] Theodore Lim,et al. SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[13] Mark Steedman,et al. Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[14] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[15] Lars Kotthoff,et al. Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[16] Frank Hutter,et al. Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[17] Yue Zhang,et al. NCRF++: An Open-source Neural Sequence Labeling Toolkit , 2018, ACL.

[18] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.

[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20] Peter Dayan,et al. Fast Parametric Learning with Activation Memorization , 2018, ICML.

[21] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[22] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[23] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[25] Jean-David Ruvini,et al. Learning Better Internal Structure of Words for Sequence Labeling , 2018, EMNLP.

[26] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[27] David D. Cox,et al. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[28] Fandong Meng,et al. GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling , 2019, ACL.

[29] Peter J. Angeline,et al. An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[30] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[31] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.

[33] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[34] Richard Socher,et al. Revisiting Activation Regularization for Language RNNs , 2017, ArXiv.

[35] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.