Taming Hyper-parameters in Deep Learning Systems
暂无分享,去创建一个
Peter Pietzuch | Alexandros Koliousis | Luo Mai | Andrei-Octavian Brabete | Guo Li | Luo Mai | Guo Li | A. Brabete | A. Koliousis | P. Pietzuch | Andrei-Octavian Brabete
[1] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[2] Tie-Yan Liu,et al. Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.
[3] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[4] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[5] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[6] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[7] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[8] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[9] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[10] Mark W. Schmidt,et al. Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.
[11] A. Krizhevsky. Convolutional Deep Belief Networks on CIFAR-10 , 2010 .
[12] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[13] H. Robbins. A Stochastic Approximation Method , 1951 .
[14] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[15] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[16] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[17] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[18] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[19] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[20] James Demmel,et al. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes , 2019, ArXiv.
[21] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[22] Ioannis Mitliagkas,et al. YellowFin and the Art of Momentum Tuning , 2017, MLSys.
[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[24] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[25] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[26] Matthias Weidlich,et al. Crossbow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers , 2019, Proc. VLDB Endow..
[27] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[28] Léon Bottou,et al. On-line learning and stochastic approximations , 1999 .
[29] Aaron Klein,et al. Efficient and Robust Automated Machine Learning , 2015, NIPS.
[30] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[31] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[32] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] David A. Patterson,et al. A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution , 2018, IEEE Micro.
[35] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[36] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[37] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[38] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[39] Lin Ma,et al. Self-Driving Database Management Systems , 2017, CIDR.
[40] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] Dario Amodei,et al. An Empirical Model of Large-Batch Training , 2018, ArXiv.
[43] Takuya Akiba,et al. Variance-based Gradient Compression for Efficient Distributed Deep Learning , 2018, ICLR.
[44] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[45] Brendan J. Frey,et al. Adaptive dropout for training deep neural networks , 2013, NIPS.
[46] Yuanzhi Li,et al. Can SGD Learn Recurrent Neural Networks with Provable Generalization? , 2019, NeurIPS.
[47] Jia Deng,et al. A large-scale hierarchical image database , 2009, CVPR 2009.
[48] Qingquan Song,et al. Efficient Neural Architecture Search with Network Morphism , 2018, ArXiv.