暂无分享,去创建一个
Agris Sostaks | Emils Ozolins | Karlis Freivalds | Andis Draguns | Matiss Apinis | A. Sostaks | Emils Ozolins | Andis Draguns | Kārlis Freivalds | Matiss Apinis
[1] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[2] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[3] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[6] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[7] Hai Wang,et al. Broad Context Language Modeling as Reading Comprehension , 2016, EACL.
[8] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[9] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[10] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[11] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.
[12] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[13] Alex Graves,et al. Grid Long Short-Term Memory , 2015, ICLR.
[14] Samy Bengio,et al. Can Active Memory Replace Attention? , 2016, NIPS.
[15] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[16] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[17] Sandeep Subramanian,et al. Deep Complex Networks , 2017, ICLR.
[18] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines - Revised , 2015 .
[19] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.
[20] Dongyu Li,et al. Complex Transformer: A Framework for Modeling Complex-Valued Sequence , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[22] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[23] Phil Blunsom,et al. Learning to Transduce with Unbounded Memory , 2015, NIPS.
[24] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[25] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[26] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[27] Zaïd Harchaoui,et al. Learning Features of Music from Scratch , 2016, ICLR.
[28] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[29] Angela Yao,et al. Complex Gated Recurrent Neural Networks , 2018, NeurIPS.
[30] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[31] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[32] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[33] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[34] Simon Dixon,et al. Automatic Music Transcription: An Overview , 2019, IEEE Signal Processing Magazine.
[35] Agris Sostaks,et al. Neural Shuffle-Exchange Networks − Sequence Processing in O( n log n ) Time , 2019 .
[36] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[37] Zhiyuan Zhang,et al. Understanding and Improving Layer Normalization , 2019, NeurIPS.
[38] Wojciech Zaremba,et al. Learning Simple Algorithms from Examples , 2015, ICML.
[39] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[40] Neel Kant,et al. Recent Advances in Neural Program Synthesis , 2018, ArXiv.
[41] Karlis Freivalds,et al. Improving the Neural GPU Architecture for Algorithm Learning , 2017, ArXiv.
[42] Zaïd Harchaoui,et al. Invariances and Data Augmentation for Supervised Music Transcription , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Zheng Zhang,et al. Star-Transformer , 2019, NAACL.
[44] Tomas Mikolov,et al. Advances in Pre-Training Distributed Word Representations , 2017, LREC.
[45] Sandro Pezzelle,et al. The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.
[46] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[47] Douglas Eck,et al. Music Transformer , 2018, 1809.04281.
[48] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[49] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.