GSdyn: Learning training dynamics via online Gaussian optimization with gradient states
暂无分享,去创建一个
[1] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[2] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[3] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[4] Jasper Snoek,et al. Multi-Task Bayesian Optimization , 2013, NIPS.
[5] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[6] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..
[7] Michael Minyi Zhang,et al. Embarrassingly Parallel Inference for Gaussian Processes , 2017, J. Mach. Learn. Res..
[8] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[9] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[10] Dacheng Tao,et al. Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence , 2019, NeurIPS.
[11] Zhanxing Zhu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[12] Aaron Klein,et al. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.
[13] Ioannis Mitliagkas,et al. Accelerated Stochastic Power Iteration , 2017, AISTATS.
[14] Lyudmila Mihaylova,et al. Ensemble Kalman Filtering for Online Gaussian Process Regression and Learning , 2018, 2018 21st International Conference on Information Fusion (FUSION).
[15] Leslie N. Smith,et al. A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.
[16] Aaron Klein,et al. Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.
[17] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[18] Aaron Klein,et al. BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.
[19] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.
[20] Yoshua Bengio,et al. A Walk with SGD , 2018, ArXiv.
[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Yoshua Bengio,et al. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length , 2018, ICLR.
[23] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[24] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[25] Peter I. Frazier,et al. A Tutorial on Bayesian Optimization , 2018, ArXiv.
[26] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[29] Chunpeng Wu,et al. SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning , 2018, 1805.07898.
[30] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[31] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[32] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[33] Peter Green,et al. Markov chain Monte Carlo in Practice , 1996 .