暂无分享,去创建一个
Mark Chen | Alec Radford | Heewoo Jun | Prafulla Dhariwal | Aditya Ramesh | John Schulman | Christopher Hesse | Tom Henighan | Jared Kaplan | Mor Katz | Jacob Jackson | Tom B. Brown | Scott Gray | Chris Hallacy | Benjamin Mann | Nick Ryder | Daniel M. Ziegler | Dario Amodei | Sam McCandlish | Daniel M. Ziegler | Alec Radford | Dario Amodei | J. Schulman | Prafulla Dhariwal | Jacob Jackson | Christopher Hesse | T. Henighan | J. Kaplan | Mor Katz | Mark Chen | Heewoo Jun | Scott Gray | Chris Hallacy | Benjamin Mann | A. Ramesh | Nick Ryder | Sam McCandlish | S. Gray | John Schulman
[1] Editors , 1986, Brain Research Bulletin.
[2] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.
[3] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[6] Frank Hutter,et al. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.
[7] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[8] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[9] Dario Amodei,et al. An Empirical Model of Large-Batch Training , 2018, ArXiv.
[10] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[11] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[12] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[13] Aran Komatsuzaki,et al. One Epoch Is All You Need , 2019, ArXiv.
[14] Ruslan Salakhutdinov,et al. Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.
[15] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[16] Pushmeet Kohli,et al. Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.
[17] Jianfeng Gao,et al. Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving , 2019, ArXiv.
[18] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[19] Jared Kaplan,et al. A Neural Scaling Law from the Dimension of the Data Manifold , 2020, ArXiv.
[20] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[21] Ilya Sutskever,et al. Jukebox: A Generative Model for Music , 2020, ArXiv.
[22] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[23] Mathijs Mul,et al. Compositionality Decomposed: How do Neural Networks Generalise? , 2019, J. Artif. Intell. Res..
[24] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[25] Jonathan S. Rosenfeld,et al. A Constructive Prediction of the Generalization Error Across Scales , 2019, ICLR.
[26] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[27] Jakob Uszkoreit,et al. Scaling Autoregressive Video Models , 2019, ICLR.
[28] Jonathan S. Rosenfeld,et al. On the Predictability of Pruning Across Scales , 2020, ICML.
[29] Mary Williamson,et al. Recipes for Building an Open-Domain Chatbot , 2020, EACL.