ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
暂无分享,去创建一个
[1] Dong Xu,et al. Sketch-Based Empirical Natural Gradient Methods for Deep Learning , 2022, Journal of Scientific Computing.
[2] N. Higham,et al. Mixed precision algorithms in numerical linear algebra , 2022, Acta Numerica.
[3] Donald Goldfarb,et al. Tensor Normal Training for Deep Learning Models , 2021, NeurIPS.
[4] Yue Wu,et al. SKFAC: Training Neural Networks with Faster Kronecker-Factored Approximate Curvature , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Yi Zhang,et al. Efficient Full-Matrix Adaptive Regularization , 2020, ICML.
[6] Rio Yokota,et al. Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC , 2020, KDD.
[7] Yihao Fang,et al. Optimization of Graph Neural Networks with Natural Gradient Descent , 2020, 2020 IEEE International Conference on Big Data (Big Data).
[8] Donald Goldfarb,et al. Practical Quasi-Newton Methods for Training Deep Neural Networks , 2020, NeurIPS.
[9] Z. Wen,et al. Sketchy Empirical Natural Gradient Methods for Deep Learning , 2020, 2006.05924.
[10] Kurt Keutzer,et al. ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning , 2020, AAAI.
[11] Mohammad Emtiyaz Khan,et al. Continual Deep Learning by Functional Regularisation of Memorable Past , 2020, NeurIPS.
[12] Chuan-Sheng Foo,et al. Scalable and Practical Natural Gradient for Large-Scale Deep Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Philipp Hennig,et al. BackPACK: Packing more into backprop , 2019, International Conference on Learning Representations.
[14] Michael W. Mahoney,et al. PyHessian: Neural Networks Through the Lens of the Hessian , 2019, 2020 IEEE International Conference on Big Data (Big Data).
[15] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[16] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[17] Andrew Y. Ng,et al. NGBoost: Natural Gradient Boosting for Probabilistic Prediction , 2019, ICML.
[18] J. Stokes,et al. Quantum Natural Gradient , 2019, Quantum.
[19] Yi Ren,et al. Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks , 2019, ArXiv.
[20] Frederik Kunstner,et al. Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.
[21] Dario Amodei,et al. An Empirical Model of Large-Batch Training , 2018, ArXiv.
[22] Satoshi Matsuoka,et al. Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Didrik Nielsen,et al. Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.
[24] Pascal Vincent,et al. Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.
[25] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[26] Rif A. Saurous,et al. Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks , 2017, ICLR.
[27] Guodong Zhang,et al. Noisy Natural Gradient as Variational Inference , 2017, ICML.
[28] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[29] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[30] David Barber,et al. Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.
[31] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[32] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.
[33] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[34] Naman Agarwal,et al. Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..
[35] Xi-Lin Li,et al. Preconditioned Stochastic Gradient Descent , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[36] Ruslan Salakhutdinov,et al. Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix , 2015, ICML.
[37] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[38] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.
[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[40] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[41] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[42] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[43] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[44] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[45] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[46] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[47] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[48] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[49] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[50] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[51] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.
[52] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[53] K. Chard,et al. Deep Neural Network Training with Distributed K-FAC , 2022, IEEE Transactions on Parallel and Distributed Systems.
[54] Dan Alistarh,et al. Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization , 2021, ArXiv.
[55] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .