暂无分享,去创建一个
Aurko Roy | Ashish Vaswani | David Grangier | Mohammad Saffar | Aurko Roy | David Grangier | Ashish Vaswani | M. Saffar
[1] Chris H. Q. Ding,et al. On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.
[2] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[3] Sathish Reddy Indurthi,et al. Look Harder: A Neural Machine Translation Model with Hard Attention , 2019, ACL.
[4] Yoshua Bengio,et al. Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning , 2014, ArXiv.
[5] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[6] Pieter Abbeel,et al. PixelSNAIL: An Improved Autoregressive Generative Model , 2017, ICML.
[7] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[8] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.
[9] Pasi Fränti,et al. Balanced K-Means for Clustering , 2014, S+SSPR.
[10] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[11] Alex Graves,et al. Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.
[12] André F. T. Martins,et al. Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms , 2018, AISTATS.
[13] Erich Elsen,et al. Sparse GPU Kernels for Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[15] Quoc V. Le,et al. A Neural Transducer , 2015, 1511.04868.
[16] Dustin Tran,et al. Image Transformer , 2018, ICML.
[17] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[18] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[19] Jack W. Rae,et al. Do Transformers Need Deep Long-Range Memory? , 2020, ACL.
[20] Nal Kalchbrenner,et al. Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling , 2018, ICLR.
[21] Pascal Vincent,et al. Clustering is Efficient for Approximate Maximum Inner Product Search , 2015, ArXiv.
[22] Marc'Aurelio Ranzato,et al. Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.
[23] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[24] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.
[25] Joydeep Ghosh,et al. Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres , 2004, IEEE Transactions on Neural Networks.
[26] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[27] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[28] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[29] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[30] Haesun Park,et al. Sparse Nonnegative Matrix Factorization for Clustering , 2008 .
[31] André F. T. Martins,et al. Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers , 2017, EMNLP.
[32] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[33] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[34] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[35] Ludovic Denoyer,et al. Deep Sequential Neural Network , 2014, NIPS 2014.
[36] Samy Bengio,et al. An Online Sequence-to-Sequence Model Using Partial Conditioning , 2015, NIPS.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[39] Guillaume Lample,et al. Large Memory Layers with Product Keys , 2019, NeurIPS.
[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[41] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[42] Yoshua Bengio,et al. Convergence Properties of the K-Means Algorithms , 1994, NIPS.
[43] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[44] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.
[45] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[46] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[47] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[48] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[49] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[50] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[51] André F. T. Martins,et al. Sparse and Constrained Attention for Neural Machine Translation , 2018, ACL.
[52] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[53] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[54] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[55] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[56] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[57] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.