暂无分享,去创建一个
Ping Luo | Xinjiang Wang | Zhanglin Peng | Wenqi Shao | Ping Luo | Xinjiang Wang | Wenqi Shao | Zhanglin Peng
[1] M. Opper,et al. On the ability of the optimal perceptron to generalise , 1990 .
[2] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[3] J. Hertz,et al. Generalization in a linear perceptron in the presence of noise , 1992 .
[4] David Saad,et al. Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks , 1995, NIPS.
[5] Christopher M. Bishop,et al. Current address: Microsoft Research, , 2022 .
[6] S. Bös. STATISTICAL MECHANICS APPROACH TO EARLY STOPPING AND WEIGHT DECAY , 1998 .
[7] M. Opper,et al. Dynamics of batch training in a perceptron , 1998 .
[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[10] Pascal Vincent,et al. Adding noise to the input of a model trained with a regularized objective , 2011, ArXiv.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.
[13] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[16] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[17] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[20] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[21] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[22] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[23] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[24] Boris Ginsburg,et al. Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification , 2017, ArXiv.
[25] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.
[28] Rina Panigrahy,et al. Electron-Proton Dynamics in Deep Learning , 2017, ArXiv.
[29] S. Amari,et al. Statistical Mechanical Analysis of Online Learning with Weight Normalization in Single Layer Perceptron , 2017 .
[30] Ping Luo,et al. EigenNet: Towards Fast and Structural Learning of Deep Neural Networks , 2017, IJCAI.
[31] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[32] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[33] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[34] Ping Luo,et al. Learning Deep Architectures via Generalized Whitened Neural Networks , 2017, ICML.
[35] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[36] Ruimao Zhang,et al. Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct? , 2018, ArXiv.
[37] Garud Iyengar,et al. Robust Implicit Backpropagation , 2018, ArXiv.
[38] Matthew Botvinick,et al. On the importance of single directions for generalization , 2018, ICLR.
[39] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.
[40] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[41] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.
[42] Kevin Smith,et al. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.
[43] Ruimao Zhang,et al. SSN: Learning Sparse Switchable Normalization via SparsestMax , 2019, International Journal of Computer Vision.
[44] Xiang Li,et al. Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Ping Luo,et al. Differentiable Learning-to-Normalize via Switchable Normalization , 2018, ICLR.
[46] Xiaoou Tang,et al. Switchable Whitening for Deep Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[47] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.