暂无分享,去创建一个
Samy Bengio | Cho-Jui Hsieh | Si Si | Yang Li | Gang Li | Cho-Jui Hsieh | Samy Bengio | Si Si | Gang Li | Yang Li
[1] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Jakob Grue Simonsen,et al. Encoding word order in complex embeddings , 2019, ICLR.
[3] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[5] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[6] Cho-Jui Hsieh,et al. Learning to Encode Position for Transformer with Continuous Dynamical Model , 2020, ICML.
[7] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[8] Douglas Eck,et al. An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation , 2018, ArXiv.
[9] Dennis DeCoste,et al. Compact Random Feature Maps , 2013, ICML.
[10] Vikas Sindhwani,et al. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..
[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[12] Alex Sherstinsky,et al. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network , 2018, Physica D: Nonlinear Phenomena.
[13] Naoki Yoshinaga,et al. On the Relation between Position Information and Sentence Length in Neural Machine Translation , 2019, CoNLL.
[14] Lucy J. Colwell,et al. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers , 2020, ArXiv.
[15] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[16] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[17] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[18] Zhiwei Guan,et al. Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements , 2020, EMNLP.
[19] Marius Leordeanu,et al. Recurrent Space-time Graph Neural Networks , 2019, NeurIPS.
[20] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[21] Ambuj Tewari,et al. But How Does It Work in Theory? Linear SVM with Random Features , 2018, NeurIPS.
[22] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[23] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[24] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[25] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[26] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[27] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[28] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[29] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[30] Jakob Grue Simonsen,et al. On Position Embeddings in BERT , 2021, ICLR.
[31] Xin Zhou,et al. Mapping Natural Language Instructions to Mobile UI Action Sequences , 2020, ACL.
[32] Giambattista Parascandolo,et al. Taming the waves: sine as activation function in deep neural networks , 2017 .
[33] Antoine Liutkus,et al. Relative Positional Encoding for Transformers with Linear Complexity , 2021, ICML.
[34] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[35] Chris Quirk,et al. Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.
[36] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[37] Xing Wang,et al. Self-Attention with Structural Position Representations , 2019, EMNLP.
[38] Dustin Tran,et al. Image Transformer , 2018, ICML.
[39] Sanjiv Kumar,et al. Learning Adaptive Random Features , 2019, AAAI.
[40] Matthijs Douze,et al. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[41] Alexander J. Smola,et al. Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.