BackPACK: Packing more into backprop
暂无分享,去创建一个
[1] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[2] Rio Yokota,et al. Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method , 2019, ICPP Workshops.
[3] Frederik Kunstner,et al. Limitations of the Empirical Fisher Approximation , 2019, NeurIPS.
[4] Frank Schneider,et al. DeepOBS: A Deep Learning Optimizer Benchmark Suite , 2019, ICLR.
[5] Philipp Hennig,et al. A Modular Approach to Block-diagonal Hessian Approximations for Second-order Optimization Methods , 2019, ArXiv.
[6] Michael Innes,et al. Don't Unroll Adjoint: Differentiating SSA-Form Programs , 2018, ArXiv.
[7] Pascal Vincent,et al. Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.
[8] Mike Innes,et al. Flux: Elegant machine learning with Julia , 2018, J. Open Source Softw..
[9] François Fleuret,et al. Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.
[10] Jimmy Ba,et al. Kronecker-factored Curvature Approximations for Recurrent Neural Networks , 2018, ICLR.
[11] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[12] David Barber,et al. Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.
[13] Philipp Hennig,et al. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients , 2017, ICML.
[14] Javier Romero,et al. Coupling Adaptive Batch Sizes with Learning Rates , 2016, UAI.
[15] Andy Davis,et al. This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Tensorflow: a System for Large-scale Machine Learning Tensorflow: a System for Large-scale Machine Learning , 2022 .
[16] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[19] Ian J. Goodfellow,et al. Efficient Per-Example Gradient Computations , 2015, ArXiv.
[20] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[21] Barak A. Pearlmutter,et al. Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..
[22] Christian Szegedy,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[23] Philipp Hennig,et al. Probabilistic Line Searches for Stochastic Optimization , 2015, NIPS.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[26] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[27] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[28] S. Dreyfus,et al. Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature , 2008, Neural Networks.
[29] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[30] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[31] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[32] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.
[33] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .
[34] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .