Reservoir Transformers
暂无分享,去创建一个
[1] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[2] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[3] Shen Li,et al. PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers , 2021, ArXiv.
[4] Ryan P. Adams,et al. Randomized Automatic Differentiation , 2020, ICLR.
[5] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[6] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.
[7] David J. Schwab,et al. Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs , 2020, ICLR.
[8] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[9] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[10] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[11] Noah A. Smith,et al. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.
[12] Lucy J. Colwell,et al. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers , 2020, ArXiv.
[13] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[14] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[15] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[16] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[17] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[18] Ivan Titov,et al. Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.
[19] Yuan Cao,et al. Echo State Neural Machine Translation , 2020, ArXiv.
[20] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[21] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[22] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[23] Ali Farhadi,et al. What’s Hidden in a Randomly Weighted Neural Network? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Noah A. Smith,et al. Improving Transformer Models by Reordering their Sublayers , 2019, ACL.
[25] Michael Auli,et al. Depth-Adaptive Transformer , 2019, ICLR.
[26] Michael Auli,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[27] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[28] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[29] Oren Etzioni,et al. Green AI , 2019, Commun. ACM.
[30] Claudio Gallicchio,et al. Deep Randomized Neural Networks , 2020, INNSBDDL.
[31] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[32] Vitalii Zhelezniak,et al. Neural Language Priors , 2019, ArXiv.
[33] Yuan Ni,et al. PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation , 2019, BioNLP@ACL.
[34] Chris Pal,et al. On the impressive performance of randomly weighted encoders in summarization tasks , 2019, ACL 2019.
[35] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[36] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[37] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[38] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[39] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[40] Samy Bengio,et al. Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..
[41] Felix Wu,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[42] Douwe Kiela,et al. No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.
[43] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[44] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[45] Toshiyuki Yamane,et al. Recent Advances in Physical Reservoir Computing: A Review , 2018, Neural Networks.
[46] Xavier Serra,et al. Randomly Weighted CNNs for (Music) Audio Classification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[48] John K. Tsotsos,et al. Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing , 2018, 2019 16th Conference on Computer and Robot Vision (CRV).
[49] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[50] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[51] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.
[52] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[53] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[54] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.
[55] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.
[56] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[57] Bohyung Han,et al. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.
[58] Theodore Lim,et al. FreezeOut: Accelerate Training by Progressively Freezing Layers , 2017, NIPS 2017.
[59] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[60] Claudio Gallicchio,et al. Echo State Property of Deep Reservoir Computing Networks , 2017, Cognitive Computation.
[61] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[62] Dianhui Wang,et al. Randomness in neural networks: an overview , 2017, WIREs Data Mining Knowl. Discov..
[63] Max Jaderberg,et al. Understanding Synthetic Gradients and Decoupled Neural Interfaces , 2017, ICML.
[64] Somnath Paul,et al. Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines , 2016, Front. Neurosci..
[65] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[66] H. Jaeger,et al. Unconventional Information Processing Systems , Novel Hardware : A Tour d ’ Horizon , 2017 .
[67] Misha Denil,et al. Noisy Activation Functions , 2016, ICML.
[68] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[69] Guillermo Sapiro,et al. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.
[70] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[71] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[72] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[73] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[74] M. C. Soriano,et al. Information Processing Using Transient Dynamics of Semiconductor Lasers Subject to Delayed Feedback , 2013, IEEE Journal of Selected Topics in Quantum Electronics.
[75] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[76] Herbert Jaeger,et al. Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..
[77] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[78] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[79] Benjamin Schrauwen,et al. Compact hardware for real-time speech recognition using a Liquid State Machine , 2007, 2007 International Joint Conference on Neural Networks.
[80] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.
[81] Magnus Sahlgren,et al. An Introduction to Random Indexing , 2005 .
[82] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[83] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.
[84] Herbert Jaeger,et al. Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.
[85] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[86] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..
[87] C. Lee Giles,et al. An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.
[88] Dejan J. Sobajic,et al. Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.
[89] C. Lee Giles,et al. Effects of Noise on Convergence and Generalization in Recurrent Networks , 1994, NIPS.
[90] Robert P. W. Duin,et al. Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.
[91] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..
[92] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .
[93] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .
[94] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..
[95] H. D. Block. The perceptron: a model for brain functioning. I , 1962 .
[96] A. Gamba,et al. Further experiments with PAPA , 1961 .
[97] A. Gamba,et al. An outline of a mathematical theory of PAPA , 1961 .