暂无分享,去创建一个
Richard Socher | Kyunghyun Cho | Caiming Xiong | Krzysztof Geras | Devansh Arpit | Stanislaw Jastrzebski | Oliver Astrand | Giancarlo Kerg | Huan Wang
[1] M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .
[2] Yang Yuan,et al. Asymmetric Valleys: Beyond Sharp and Flat Local Minima , 2019, NeurIPS.
[3] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[4] David J. Schwab,et al. The Early Phase of Neural Network Training , 2020, ICLR.
[5] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[6] Andrew Gordon Wilson,et al. Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited , 2020, ArXiv.
[7] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[8] Shun-ichi Amari,et al. Universal statistics of Fisher information in deep neural networks: mean field approach , 2018, AISTATS.
[9] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[10] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[11] Srini Narayanan,et al. Stiffness: A New Perspective on Generalization in Neural Networks , 2019, ArXiv.
[12] Yuichi Yoshida,et al. Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.
[13] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[14] Aleksander Madry,et al. The Two Regimes of Deep Network Training , 2020, ArXiv.
[15] Di Huang,et al. Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels , 2020, ICML.
[16] Max Welling,et al. Gradient 𝓁1 Regularization for Quantization Robustness , 2020, ArXiv.
[17] Satrajit Chatterjee,et al. Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization , 2020, ICLR.
[18] Harris Drucker,et al. Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.
[19] Chunpeng Wu,et al. SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning , 2018, 1805.07898.
[20] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[21] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[22] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[25] Yoshua Bengio,et al. How to Initialize your Network? Robust Initialization for WeightNorm & ResNets , 2019, NeurIPS.
[26] Nicolas Le Roux,et al. On the interplay between noise and curvature and its effect on optimization and generalization , 2019, AISTATS.
[27] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[28] Junnan Li,et al. DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.
[29] Yoshua Bengio,et al. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length , 2018, ICLR.
[30] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[32] Jie Fu,et al. Jacobian Adversarially Regularized Networks for Robustness , 2020, ICLR.
[33] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[34] Samuel L. Smith,et al. Batch Normalization Biases Deep Residual Networks Towards Shallow Paths , 2020, ArXiv.
[35] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[36] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[37] Ya Le,et al. Tiny ImageNet Visual Recognition Challenge , 2015 .
[38] David G.T. Barrett,et al. Implicit Gradient Regularization , 2020, ArXiv.
[39] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.
[40] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[41] Zhi-Qin John Xu,et al. Understanding training and generalization in deep learning by Fourier analysis , 2018, ArXiv.
[42] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[43] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[44] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[45] Stefano Soatto,et al. Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence , 2019, NeurIPS.
[46] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[47] Judy Hoffman,et al. Robust Learning with Jacobian Regularization , 2019, ArXiv.
[48] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[49] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[50] Stefano Soatto,et al. Critical Learning Periods in Deep Networks , 2018, ICLR.
[51] Masashi Sugiyama,et al. Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis , 2019, ICML.
[52] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[53] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[54] Tengyu Ma,et al. Robust and On-the-fly Dataset Denoising for Image Classification , 2020, ECCV.
[55] Kyunghyun Cho,et al. The Break-Even Point on Optimization Trajectories of Deep Neural Networks , 2020, ICLR.
[56] Dániel Varga,et al. Gradient Regularization Improves Accuracy of Discriminative Models , 2017, ArXiv.
[57] Jeffrey Pennington,et al. The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks , 2020, NeurIPS.