暂无分享,去创建一个
Quoc V. Le | Hanxiao Liu | Noam Shazeer | Zihang Dai | David R. So | Wojciech Ma'nke | Hanxiao Liu | Noam M. Shazeer | Zihang Dai | Wojciech Ma'nke
[1] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[2] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[3] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[4] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[5] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.
[6] Martin Jaggi,et al. Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.
[7] Noam Shazeer,et al. GLU Variants Improve Transformer , 2020, ArXiv.
[8] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[9] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[10] Rico Sennrich,et al. Root Mean Square Layer Normalization , 2019, NeurIPS.
[11] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[12] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[13] Ameet Talwalkar,et al. Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.
[14] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[15] Bo Chen,et al. Can Weight Sharing Outperform Random Architecture Search? An Investigation With TuNAS , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[17] Chen Liang,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[18] Frank Hutter,et al. Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.
[19] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[21] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[22] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[23] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.
[24] Yee Whye Teh,et al. Multiplicative Interactions and Where to Find Them , 2020, ICLR.
[25] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[26] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.
[27] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.
[28] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..
[29] X. Yao. Evolving Artificial Neural Networks , 1999 .
[30] Chen Liang,et al. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.
[31] Risto Miikkulainen,et al. Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..
[32] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.
[33] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[34] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[35] Oriol Vinyals,et al. Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.
[36] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[37] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[38] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[40] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[41] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[42] John J. Hopfield,et al. Dense Associative Memory for Pattern Recognition , 2016, NIPS.
[43] Lee Spector,et al. Program synthesis using uniform mutation by addition and deletion , 2018, GECCO.
[44] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[45] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[46] Madian Khabsa,et al. Entailment as Few-Shot Learner , 2021, ArXiv.
[47] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[48] Hinrich Schutze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.
[49] Colin Raffel,et al. Do Transformer Modifications Transfer Across Implementations and Applications? , 2021, EMNLP.
[50] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.
[51] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.
[52] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.
[53] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[54] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[55] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[56] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.