暂无分享,去创建一个
Tao Qin | Qi Meng | Lijun Wu | Min Zhang | Wei Chen | Yue Wang | Tie-Yan Liu | Juntao Li | Xiaobo Liang | M. Zhang | Tie-Yan Liu | Tao Qin | Wei Chen | Lijun Wu | Qi Meng | Xiaobo Liang | Yue Wang | Juntao Li
[1] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[2] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[3] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[4] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[5] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[6] Yaoliang Yu,et al. Dropout with Expectation-linear Regularization , 2016, ICLR.
[7] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[8] Kaiming He,et al. Group Normalization , 2018, ECCV.
[9] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[10] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[11] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[12] Heeyoul Choi,et al. Self-Knowledge Distillation in Natural Language Processing , 2019, RANLP.
[13] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[14] Tao Qin,et al. Depth Growing for Neural Machine Translation , 2019, ACL.
[15] Christopher D. Manning,et al. Fast dropout training , 2013, ICML.
[16] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[17] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[19] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[20] Yue Wang,et al. The scale-invariant space for attention layer in neural network , 2020, Neurocomputing.
[21] Armen Aghajanyan,et al. Better Fine-Tuning by Reducing Representational Collapse , 2020, ICLR.
[22] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[23] Hossein Mobahi,et al. Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.
[24] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[25] Yoshua Bengio,et al. Fraternal Dropout , 2017, ICLR.
[26] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[27] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.
[28] Tianyu Gao,et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.
[29] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[30] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[32] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[33] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[34] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[35] Hai Zhao,et al. Not All Attention Is All You Need , 2021, ArXiv.
[36] Furu Wei,et al. Scheduled DropHead: A Regularization Method for Transformer Models , 2020, FINDINGS.
[37] S. Valaee,et al. Survey of Dropout Methods for Deep Neural Networks , 2019, ArXiv.
[38] Surya Ganguli,et al. Analyzing noise in autoencoders and deep networks , 2014, ArXiv.
[39] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[40] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[41] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[42] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[43] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[44] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[45] Colin Wei,et al. The Implicit and Explicit Regularization Effects of Dropout , 2020, ICML.
[46] Eduard Hovy,et al. Manual and automatic evaluation of summaries , 2002, ACL 2002.
[47] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[48] Erhardt Barth,et al. Recurrent Dropout without Memory Loss , 2016, COLING.
[49] Lei Zhang,et al. SEED: Self-supervised Distillation For Visual Representation , 2021, ArXiv.
[50] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[51] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[52] Brendan J. Frey,et al. Adaptive dropout for training deep neural networks , 2013, NIPS.
[53] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[54] Yao Zhao,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.
[55] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.
[56] Junsong Yuan,et al. Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective , 2021, ICLR.
[57] Lawrence Carin,et al. MixKD: Towards Efficient Distillation of Large-scale Language Models , 2020, ICLR.
[58] Dacheng Tao,et al. Shakeout: A New Regularized Deep Neural Network Training Scheme , 2016, AAAI.
[59] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[60] Xuanjing Huang,et al. DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks , 2019, ArXiv.
[61] Yelong Shen,et al. A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation , 2020, ArXiv.
[62] Jingjing Xu,et al. MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning , 2019, ArXiv.
[63] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[64] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.
[65] Yuanzhi Li,et al. Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning , 2020, International Conference on Learning Representations.
[66] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[67] Behrouz Minaei,et al. A survey of regularization strategies for deep models , 2019, Artificial Intelligence Review.
[68] Xiaodong Gu,et al. Towards dropout training for convolutional neural networks , 2015, Neural Networks.
[69] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[70] Dmitry P. Vetrov,et al. Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.
[71] Kaisheng Ma,et al. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[72] Xianglong Liu,et al. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks , 2017, AAAI.
[73] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[74] Ming Zhou,et al. ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.
[75] Tie-Yan Liu,et al. Incorporating BERT into Neural Machine Translation , 2020, ICLR.
[76] Shafiq Joty,et al. Data Diversification: A Simple Strategy For Neural Machine Translation , 2020, NeurIPS.
[77] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[78] Jianfeng Gao,et al. Very Deep Transformers for Neural Machine Translation , 2020, ArXiv.
[79] Pascal Vincent,et al. The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.
[80] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[81] Quoc V. Le,et al. AutoDropout: Learning Dropout Patterns to Regularize Deep Networks , 2021, AAAI.
[82] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.