Decoupled Weight Decay Regularization
暂无分享,去创建一个
[1] Lorien Y. Pratt,et al. Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.
[2] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[3] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[4] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[5] Shai Shalev-Shwartz,et al. Beyond Convexity: Stochastic Quasi-Convex Optimization , 2015, NIPS.
[6] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[9] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[10] Shuo Yang,et al. WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[12] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[14] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[15] Xavier Gastaldi,et al. Shake-Shake regularization , 2017, ArXiv.
[16] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[17] Lin Ma,et al. Normalized Direction-preserving Adam , 2017, ArXiv.
[18] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[20] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[21] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[22] Kilian Q. Weinberger,et al. Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.
[23] Frank Hutter,et al. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.
[24] Sashank J. Reddi,et al. On the Convergence of Adam and Beyond , 2018, ICLR.
[25] Laurence Aitchison,et al. A unified theory of adaptive stochastic gradient descent as Bayesian filtering , 2018, ArXiv.
[26] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[27] Wolfram Burgard,et al. Intracranial Error Detection via Deep Learning , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
[28] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Gang Yu,et al. SFace: An Efficient Network for Face Detection in Large Scale Variations , 2018, ArXiv.
[30] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.
[31] Guodong Zhang,et al. Three Mechanisms of Weight Decay Regularization , 2018, ICLR.
[32] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.