暂无分享,去创建一个
Kurt Keutzer | Douwe Kiela | Sheng Shen | Alexei Baevski | Michael Auli | Ari S. Morcos | K. Keutzer | Michael Auli | Douwe Kiela | Alexei Baevski | Sheng Shen
[1] A. Gamba,et al. Further experiments with PAPA , 1961 .
[2] A. Gamba,et al. An outline of a mathematical theory of PAPA , 1961 .
[3] H. D. Block. The perceptron: a model for brain functioning. I , 1962 .
[4] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..
[5] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .
[6] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .
[7] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..
[8] Robert P. W. Duin,et al. Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.
[9] Dejan J. Sobajic,et al. Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.
[10] C. Lee Giles,et al. Effects of Noise on Convergence and Generalization in Recurrent Networks , 1994, NIPS.
[11] C. Lee Giles,et al. An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.
[12] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..
[13] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[14] Herbert Jaeger,et al. Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.
[15] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.
[16] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[17] Magnus Sahlgren,et al. An Introduction to Random Indexing , 2005 .
[18] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.
[19] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[20] Benjamin Schrauwen,et al. Compact hardware for real-time speech recognition using a Liquid State Machine , 2007, 2007 International Joint Conference on Neural Networks.
[21] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[22] Herbert Jaeger,et al. Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..
[23] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[24] Marcello Federico,et al. Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.
[25] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[26] M. C. Soriano,et al. Information Processing Using Transient Dynamics of Semiconductor Lasers Subject to Delayed Feedback , 2013, IEEE Journal of Selected Topics in Quantum Electronics.
[27] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[28] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[29] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.
[30] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[31] Misha Denil,et al. Noisy Activation Functions , 2016, ICML.
[32] Guillermo Sapiro,et al. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.
[33] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[34] Claudio Gallicchio,et al. Echo State Property of Deep Reservoir Computing Networks , 2017, Cognitive Computation.
[35] Max Jaderberg,et al. Understanding Synthetic Gradients and Decoupled Neural Interfaces , 2017, ICML.
[36] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[37] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[38] Bohyung Han,et al. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.
[39] Dianhui Wang,et al. Randomness in neural networks: an overview , 2017, WIREs Data Mining Knowl. Discov..
[40] Theodore Lim,et al. FreezeOut: Accelerate Training by Progressively Freezing Layers , 2017, NIPS 2017.
[41] Somnath Paul,et al. Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines , 2016, Front. Neurosci..
[42] H. Jaeger,et al. Unconventional Information Processing Systems , Novel Hardware : A Tour d ’ Horizon , 2017 .
[43] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[44] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.
[45] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.
[46] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[47] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[48] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[49] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.
[50] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[51] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[52] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[53] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[54] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[55] Xavier Serra,et al. Randomly Weighted CNNs for (Music) Audio Classification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Samy Bengio,et al. Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..
[57] Douwe Kiela,et al. No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.
[58] John K. Tsotsos,et al. Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing , 2018, 2019 16th Conference on Computer and Robot Vision (CRV).
[59] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[60] Claudio Gallicchio,et al. Deep Randomized Neural Networks , 2020, INNSBDDL.
[61] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[62] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[63] Toshiyuki Yamane,et al. Recent Advances in Physical Reservoir Computing: A Review , 2018, Neural Networks.
[64] Chris Pal,et al. On the impressive performance of randomly weighted encoders in summarization tasks , 2019, ACL 2019.
[65] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[66] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[67] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[68] Michael Auli,et al. Depth-Adaptive Transformer , 2019, ICLR.
[69] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[70] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[71] Lucy J. Colwell,et al. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers , 2020, ArXiv.
[72] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[73] Noah A. Smith,et al. Improving Transformer Models by Reordering their Sublayers , 2019, ACL.
[74] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[75] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[76] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[77] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[78] Ali Farhadi,et al. What’s Hidden in a Randomly Weighted Neural Network? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[79] Yuan Cao,et al. Echo State Neural Machine Translation , 2020, ArXiv.
[80] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[81] Noah A. Smith,et al. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.
[82] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[83] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[84] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[85] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[86] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[87] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.
[88] Ryan P. Adams,et al. Randomized Automatic Differentiation , 2020, ICLR.
[89] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[90] David J. Schwab,et al. Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs , 2020, ICLR.
[91] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.