暂无分享,去创建一个
Dan Klein | Kurt Keutzer | Joseph E. Gonzalez | Eric Wallace | Sheng Shen | Zhuohan Li | Kevin Lin | K. Keutzer | D. Klein | Joseph Gonzalez | Sheng Shen | Kevin Lin | Zhuohan Li | Eric Wallace
[1] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[2] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[3] Erich Elsen,et al. Rigging the Lottery: Making All Tickets Winners , 2020, ICML.
[4] Alex Graves,et al. Memory-Efficient Backpropagation Through Time , 2016, NIPS.
[5] Xin Wang,et al. Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.
[6] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[7] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[8] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[9] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[10] Erich Elsen,et al. Fast Sparse ConvNets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.
[12] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[13] Jang Hyun Cho,et al. On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[15] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[16] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[17] Matthew Mattina,et al. Compressing RNNs for IoT devices by 15-38x using Kronecker Products , 2019, ArXiv.
[18] Hassan Ghasemzadeh,et al. Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.
[19] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[21] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[22] Ersin Yumer,et al. Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints , 2019, ICLR.
[23] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[24] Di He,et al. Efficient Training of BERT by Progressively Stacking , 2019, ICML.
[25] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[26] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[27] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[28] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] Ali Farhadi,et al. Soft Threshold Weight Reparameterization for Learnable Sparsity , 2020, ICML.
[31] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[32] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[33] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[34] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[35] Kurt Keutzer,et al. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Kurt Keutzer,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.
[39] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[40] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[41] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[42] Gregory Diamos,et al. Empirically Characterizing Overparameterization Impact on Convergence , 2018 .
[43] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[44] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[45] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[46] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[47] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[48] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[49] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[50] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[51] Christopher D. Manning,et al. Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.
[52] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[53] Dario Amodei,et al. An Empirical Model of Large-Batch Training , 2018, ArXiv.
[54] Raquel Urtasun,et al. The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.
[55] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[56] Kurt Keutzer,et al. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.
[57] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[58] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[59] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[60] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[61] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[62] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[63] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[64] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[65] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[66] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[67] Nikko Ström,et al. Sparse connection and pruning in large dynamic artificial neural networks , 1997, EUROSPEECH.
[68] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[69] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.
[70] Jianxin Wu,et al. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[71] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[72] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[73] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[74] Aaron Klein,et al. Efficient and Robust Automated Machine Learning , 2015, NIPS.