暂无分享,去创建一个
[1] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.
[2] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[3] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[4] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.
[5] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[7] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[8] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[9] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[10] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[11] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[12] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.