暂无分享,去创建一个
Kamyar Azizzadenesheli | Anima Anandkumar | Jiawei Zhao | Jeremy Bernstein | Jeremy Bernstein | K. Azizzadenesheli | Anima Anandkumar | Jiawei Zhao
[1] F. P. Cantelli. Sui confini della probabilità , 1929 .
[2] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[3] F. Pukelsheim. The Three Sigma Rule , 1994 .
[4] John J. Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities , 1999 .
[5] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[6] H. Robbins. A Stochastic Approximation Method , 1951 .
[7] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Volkan Cevher,et al. Stochastic Spectral Descent for Discrete Graphical Models , 2016, IEEE Journal of Selected Topics in Signal Processing.
[10] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[11] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[12] Rachid Guerraoui,et al. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.
[13] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[14] Sashank J. Reddi,et al. On the Convergence of Adam and Beyond , 2018, ICLR.
[15] Dan Alistarh,et al. Byzantine Stochastic Gradient Descent , 2018, NeurIPS.
[16] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.
[17] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[18] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[19] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[20] Philipp Hennig,et al. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients , 2017, ICML.