Reservoir Transformers
暂无分享,去创建一个
[1] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..
[2] Misha Denil,et al. Noisy Activation Functions , 2016, ICML.
[3] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[4] Magnus Sahlgren,et al. An Introduction to Random Indexing , 2005 .
[5] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[6] Herbert Jaeger,et al. Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.
[7] Bohyung Han,et al. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.
[8] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[9] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[10] Toshiyuki Yamane,et al. Recent Advances in Physical Reservoir Computing: A Review , 2018, Neural Networks.
[11] A. Gamba,et al. Further experiments with PAPA , 1961 .
[12] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.
[13] Guillermo Sapiro,et al. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.
[14] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[15] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.
[16] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[17] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .
[18] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[19] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..
[20] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.
[21] M. C. Soriano,et al. Information Processing Using Transient Dynamics of Semiconductor Lasers Subject to Delayed Feedback , 2013, IEEE Journal of Selected Topics in Quantum Electronics.
[22] Xavier Serra,et al. Randomly Weighted CNNs for (Music) Audio Classification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[24] John K. Tsotsos,et al. Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing , 2018, 2019 16th Conference on Computer and Robot Vision (CRV).
[25] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.
[26] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .
[27] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[28] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[29] Claudio Gallicchio,et al. Echo State Property of Deep Reservoir Computing Networks , 2017, Cognitive Computation.
[30] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[31] Dianhui Wang,et al. Randomness in neural networks: an overview , 2017, WIREs Data Mining Knowl. Discov..
[32] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[33] C. Lee Giles,et al. An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.
[34] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..
[35] Ali Farhadi,et al. What’s Hidden in a Randomly Weighted Neural Network? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[37] Dejan J. Sobajic,et al. Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.
[38] Herbert Jaeger,et al. Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..
[39] Robert P. W. Duin,et al. Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.
[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[41] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[42] Yuan Ni,et al. PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation , 2019, BioNLP@ACL.
[43] Lucy J. Colwell,et al. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers , 2020, ArXiv.
[44] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.
[45] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[46] Benjamin Schrauwen,et al. Compact hardware for real-time speech recognition using a Liquid State Machine , 2007, 2007 International Joint Conference on Neural Networks.
[47] A. Gamba,et al. An outline of a mathematical theory of PAPA , 1961 .
[48] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[49] Somnath Paul,et al. Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines , 2016, Front. Neurosci..
[50] C. Lee Giles,et al. Effects of Noise on Convergence and Generalization in Recurrent Networks , 1994, NIPS.