An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
暂无分享,去创建一个
[1] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[2] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[3] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[4] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[5] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[7] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[8] Yousef Saad,et al. Fast Estimation of tr(f(A)) via Stochastic Lanczos Quadrature , 2017, SIAM J. Matrix Anal. Appl..
[9] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[10] Ryan P. Adams,et al. Estimating the Spectral Density of Large Implicit Matrices , 2018, 1802.03451.
[11] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[12] Laurent Demanet. On Chebyshev interpolation of analytic functions , 2010 .
[13] Gene H. Golub,et al. Calculation of Gauss quadrature rules , 1967, Milestones in Matrix Computation.
[14] Vardan Papyan,et al. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size. , 2018 .
[15] Nico M. Temme,et al. Numerical methods for special functions , 2007 .
[16] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[17] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .
[19] Stanislav Fort,et al. The Goldilocks zone: Towards better understanding of neural network loss landscapes , 2018, AAAI.
[20] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[21] G. Golub,et al. Matrices, Moments and Quadrature with Applications , 2009 .
[22] Xaq Pitkow,et al. Skip Connections Eliminate Singularities , 2017, ICLR.
[23] Vardan Papyan,et al. The Full Spectrum of Deep Net Hessians At Scale: Dynamics with Sample Size , 2018, ArXiv.
[24] Lei Wu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[25] Yousef Saad,et al. Approximating Spectral Densities of Large Matrices , 2013, SIAM Rev..
[26] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[27] Pierre C Bellec,et al. Concentration of quadratic forms under a Bernstein moment assumption , 2019, 1901.08736.
[28] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[29] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[30] G. Golub,et al. Some large-scale matrix computation problems , 1996 .
[31] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[34] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.