signADAM++: Learning Confidences for Deep Neural Networks
暂无分享,去创建一个
Licheng Jiao | Fanhua Shang | Hongying Liu | Dong Wang | Qigong Sun | Yicheng Liu | Wenwo Tang | L. Jiao | Hongying Liu | D. Wang | Fanhua Shang | Yicheng Liu | Qigong Sun | Wen Tang
[1] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.
[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[4] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[5] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[8] I. Johnstone,et al. Ideal spatial adaptation by wavelet shrinkage , 1994 .
[9] Adi Livnat,et al. A mixability theory for the role of sex in evolution , 2008, Proceedings of the National Academy of Sciences.
[10] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[11] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[12] Kamyar Azizzadenesheli,et al. Convergence rate of sign stochastic gradient descent for non-convex functions , 2018 .
[13] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.
[14] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).
[15] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[16] Jiao Licheng,et al. signADAM++: Learning Confidences for Deep Neural Networks , 2019 .
[17] S. Laughlin,et al. An Energy Budget for Signaling in the Grey Matter of the Brain , 2001, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.
[18] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] R. Douglas,et al. Recurrent neuronal circuits in the neocortex , 2007, Current Biology.
[21] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[22] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.
[23] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[24] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[25] Philipp Hennig,et al. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients , 2017, ICML.
[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[27] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[28] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[29] P. Lennie. The Cost of Cortical Computation , 2003, Current Biology.
[30] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[31] John Langford,et al. Sparse Online Learning via Truncated Gradient , 2008, NIPS.
[32] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[33] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[35] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[36] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[37] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[38] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[39] H. Robbins. A Stochastic Approximation Method , 1951 .
[40] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .