Amortized Proximal Optimization
暂无分享,去创建一个
[1] Board , 2023, Médecine des Maladies Métaboliques.
[2] M. Ghassemi,et al. If Influence Functions are the Answer, Then What is the Question? , 2022, NeurIPS.
[3] Timothy M. Hospedales,et al. Meta Mirror Descent: Optimiser Learning for Fast Convergence , 2022, ArXiv.
[4] Paul Vicol,et al. Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies , 2021, ICML.
[5] Yue Wu,et al. SKFAC: Training Neural Networks with Faster Kronecker-Factored Approximate Curvature , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Samuel S. Schoenholz,et al. Whitening and Second Order Optimization Both Make Information in the Dataset Unusable During Training, and Can Reduce or Prevent Generalization , 2020, ICML.
[7] Amos Storkey,et al. Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons , 2020, ArXiv.
[8] Sharan Vaswani,et al. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, AISTATS.
[9] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[10] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.
[11] David Duvenaud,et al. Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.
[12] P. Frasconi,et al. Marthe: Scheduling the Learning Rate Via Online Hypergradients , 2019, International Joint Conference on Artificial Intelligence.
[13] Theodore H. Moskovitz,et al. First-Order Preconditioning via Hypergradient Descent , 2019, ArXiv.
[14] Andrei A. Rusu,et al. Meta-Learning with Warped Gradient Descent , 2019, ICLR.
[15] Guodong Zhang,et al. Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model , 2019, NeurIPS.
[16] James Martens,et al. Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks , 2019, NeurIPS.
[17] Mark W. Schmidt,et al. Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.
[18] S. Kakade,et al. Revisiting the Polyak step size , 2019, 1905.00313.
[19] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[20] Angelika Steger,et al. Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning , 2019, ICML.
[21] Junier B. Oliva,et al. Meta-Curvature , 2019, NeurIPS.
[22] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[23] Aaron Mishkin,et al. SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.
[24] John C. Duchi,et al. Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity , 2018, SIAM J. Optim..
[25] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[26] Jeremy Nixon,et al. Learned optimizers that outperform on wall-clock and validation loss , 2018 .
[27] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[28] Arthur Jacot,et al. Neural Tangent Kernel: Convergence and Generalization in Neural Networks , 2018, NeurIPS.
[29] Pascal Vincent,et al. Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.
[30] Angelika Steger,et al. Approximating Real-Time Recurrent Learning with Random Kronecker Factors , 2018, NeurIPS.
[31] David Rolnick,et al. Measuring and regularizing networks in function space , 2018, ICLR.
[32] Richard S. Zemel,et al. Aggregated Momentum: Stability Through Passive Damping , 2018, ICLR.
[33] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[34] Roger B. Grosse,et al. Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.
[35] Jimmy Ba,et al. Kronecker-factored Curvature Approximations for Recurrent Neural Networks , 2018, ICLR.
[36] Pascal Vincent,et al. An Evaluation of Fisher Approximations Beyond Kronecker Factorization , 2018, ICLR.
[37] Georg Martius,et al. L4: Practical loss-based stepsize adaptation for deep learning , 2018, NeurIPS.
[38] Seungjin Choi,et al. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.
[39] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[40] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[41] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Mark W. Schmidt,et al. Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.
[44] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.
[45] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[46] Paolo Frasconi,et al. Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.
[47] Jitendra Malik,et al. Learning to Optimize Neural Nets , 2017, ArXiv.
[48] Yann Ollivier,et al. Unbiased Online Recurrent Optimization , 2017, ICLR.
[49] Kevin G. Jamieson,et al. Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization , 2016, ICLR.
[50] Sergio Gomez Colmenarejo,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[51] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.
[52] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[53] James Martens. Second-order Optimization for Neural Networks , 2016 .
[54] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[55] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[57] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[58] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.
[59] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[60] Christian Szegedy,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[61] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[62] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[63] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[64] Jasper Snoek,et al. Freeze-Thaw Bayesian Optimization , 2014, ArXiv.
[65] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[66] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[67] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[68] Justin Domke,et al. Generic Methods for Optimization-Based Modeling , 2012, AISTATS.
[69] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[70] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[71] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[72] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[73] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[74] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[75] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[76] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[77] H. Robbins. A Stochastic Approximation Method , 1951 .
[78] Marcello Federico,et al. Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.
[79] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[80] Geoffrey E. Hinton,et al. Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks , 2006 .
[81] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[82] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[83] S. Amari. Natural Gradient Works Eciently in Learning , 2022 .