Bird-Eye Transformers for Text Generation Models
暂无分享,去创建一个
[1] Anna Maria Di Sciullo. On Aspects of the Theory of Syntax , 2021, Inference: International Review of Science.
[2] Elena Agliari,et al. Boltzmann Machines as Generalized Hopfield Networks: A Review of Recent Results and Outlooks , 2020, Entropy.
[3] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[4] J. Hopfield,et al. Large Associative Memory Problem in Neurobiology and Machine Learning , 2020, ICLR.
[5] Vitaly Feldman,et al. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.
[6] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[7] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[8] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[9] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[10] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2020, TACL.
[11] Liu Yang,et al. Sparse Sinkhorn Attention , 2020, ICML.
[12] S. Venkatesh,et al. Self-Attentive Associative Memory , 2020, ICML.
[13] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[14] Xuancheng Ren,et al. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection , 2019, ArXiv.
[15] Omer Levy,et al. Blockwise Self-Attention for Long Document Understanding , 2019, FINDINGS.
[16] Chris Quirk,et al. Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.
[17] Mikhail Belkin,et al. Overparameterized neural networks implement associative memory , 2019, Proceedings of the National Academy of Sciences.
[18] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[19] Hung-yi Lee,et al. Tree Transformer: Integrating Tree Structures into Self-Attention , 2019, EMNLP.
[20] Guillaume Lample,et al. Augmenting Self-attention with Persistent Memory , 2019, ArXiv.
[21] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[22] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[23] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[24] Tong Zhang,et al. Modeling Localness for Self-Attention Networks , 2018, EMNLP.
[25] Yee Whye Teh,et al. Set Transformer , 2018, ICML.
[26] Aaron C. Courville,et al. Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.
[27] Satrajit Chatterjee,et al. Learning and Memorization , 2018, ICML.
[28] Zhifang Sui,et al. Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction , 2018, AAAI.
[29] Dustin Tran,et al. Image Transformer , 2018, ICML.
[30] Peter J. Liu,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[31] Jin-Hui Wang,et al. Associative memory cells and their working principle in the brain , 2018, F1000Research.
[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[33] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[34] John J. Hopfield,et al. Dense Associative Memory for Pattern Recognition , 2016, NIPS.
[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Alex Graves. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[39] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[40] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[41] Robert L. Mercer,et al. An Estimate of an Upper Bound for the Entropy of English , 1992, CL.
[42] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[43] Mauro Cettolo,et al. The IWSLT 2016 Evaluation Campaign , 2016, IWSLT.
[44] R. Miikkulainen. Hopfield Network , 2010, Encyclopedia of Machine Learning and Data Mining.
[45] Alaa A. Kharbouch,et al. Three models for the description of language , 1956, IRE Trans. Inf. Theory.