暂无分享,去创建一个
Kurt Keutzer | Michael W. Mahoney | Sheng Shen | Amir Gholami | Zhewei Yao | Michael Mahoney | K. Keutzer | A. Gholami | Sheng Shen | Z. Yao
[1] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[2] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[4] Xiangyu Zhang,et al. Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization , 2020, ICLR.
[5] Di He,et al. Representation Degeneration Problem in Training Natural Language Generation Models , 2019, ICLR.
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Abhinav Shrivastava,et al. EvalNorm: Estimating Batch Normalization Statistics for Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[10] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[11] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[12] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[13] Geoffrey E. Hinton,et al. Neighbourhood Components Analysis , 2004, NIPS.
[14] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.
[15] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[16] Zhiyuan Zhang,et al. Understanding and Improving Layer Normalization , 2019, NeurIPS.
[17] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[18] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[19] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[20] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[23] Ming Zhou,et al. A Tensorized Transformer for Language Modeling , 2019, NeurIPS.
[24] Julian Salazar,et al. Transformers without Tears: Improving the Normalization of Self-Attention , 2019, ArXiv.
[25] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[26] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[27] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[28] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[29] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[30] Rico Sennrich,et al. Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention , 2019, EMNLP.
[31] Jing Huang,et al. Improving Neural Language Generation with Spectrum Control , 2020, ICLR.
[32] Rico Sennrich,et al. Root Mean Square Layer Normalization , 2019, NeurIPS.
[33] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[35] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[36] Kurt Keutzer,et al. PyHessian: Neural Networks Through the Lens of the Hessian , 2019, 2020 IEEE International Conference on Big Data (Big Data).
[37] Kilian Q. Weinberger,et al. Positional Normalization , 2019, NeurIPS.
[38] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.
[39] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[40] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[42] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[43] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.
[44] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[45] Tie-Yan Liu,et al. Normalization Helps Training of Quantized LSTM , 2019, NeurIPS.
[46] Kaiming He,et al. Group Normalization , 2018, ECCV.