暂无分享,去创建一个
[1] Nathan Srebro,et al. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.
[2] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[3] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[4] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[5] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[6] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[7] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[8] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[9] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[10] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[11] Qian Qian,et al. The Implicit Bias of AdaGrad on Separable Data , 2019, NeurIPS.
[12] Tuo Zhao,et al. Implicit Bias of Gradient Descent based Adversarial Training on Separable Data , 2020, ICLR.
[13] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[16] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[17] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[18] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[19] Pan Zhou,et al. Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning , 2020, NeurIPS.
[20] J. Duncan,et al. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients , 2020, NeurIPS.
[21] Matus Telgarsky,et al. Directional convergence and alignment in deep learning , 2020, NeurIPS.