Directional convergence and alignment in deep learning
暂无分享,去创建一个
[1] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[2] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[3] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[4] Dmitriy Drusvyatskiy,et al. Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.
[5] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[6] Matus Telgarsky,et al. A refined primal-dual analysis of the implicit bias , 2019, ArXiv.
[7] Andrea Montanari,et al. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.
[8] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[9] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[10] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[11] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[12] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[13] Hossein Mobahi,et al. Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.
[14] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[15] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.
[16] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[17] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[18] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[19] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[20] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[21] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[22] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[23] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[25] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[26] Ta Lê Loi,et al. Lecture 1: O-minimal structures , 2010 .
[27] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[28] V. Grandjean,et al. On the limit set at infinity of a gradient trajectory of a semialgebraic function , 2007 .
[29] Adrian S. Lewis,et al. Clarke Subgradients of Stratifiable Functions , 2006, SIAM J. Optim..
[30] Adam Parusinski,et al. Quasi-convex decomposition in o-minimal structures. Application to the gradient conjecture , 2006 .
[31] M. Coste. AN INTRODUCTION TO O-MINIMAL GEOMETRY , 2002 .
[32] K. Kurdyka,et al. SEMIALGEBRAIC SARD THEOREM FOR GENERALIZED CRITICAL VALUES , 2000 .
[33] Adrian S. Lewis,et al. Convex Analysis And Nonlinear Optimization , 2000 .
[34] K. Kurdyka,et al. Proof of the gradient conjecture of R. Thom , 1999, math/9906212.
[35] K. Kurdyka. On gradients of functions definable in o-minimal structures , 1998 .
[36] L. Dries,et al. Geometric categories and o-minimal structures , 1996 .
[37] A. Wilkie. Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function , 1996 .
[38] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[39] András Némethi,et al. Milnor fibration at infinity , 1992 .
[40] F. Clarke. Optimization And Nonsmooth Analysis , 1983 .
[41] F. Clarke. Generalized gradients and applications , 1975 .