暂无分享,去创建一个
[1] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[2] Nathan Srebro,et al. Dropout: Explicit Forms and Capacity Control , 2020, ICML.
[3] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.
[4] René Vidal,et al. Dropout as a Low-Rank Regularizer for Matrix Factorization , 2017, AISTATS.
[5] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[6] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[7] Richard Socher,et al. Quasi-Recurrent Neural Networks , 2016, ICLR.
[8] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[9] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[10] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[11] Colin Wei,et al. Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation , 2019, NeurIPS.
[12] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[13] Colin Wei,et al. Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin , 2019, ArXiv.
[14] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[15] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[16] Judy Hoffman,et al. Robust Learning with Jacobian Regularization , 2019, ArXiv.
[17] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[18] Richard Socher,et al. Revisiting Activation Regularization for Language RNNs , 2017, ArXiv.
[19] Yaoliang Yu,et al. Dropout with Expectation-linear Regularization , 2016, ICLR.
[20] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[21] Shin-ichi Maeda,et al. A Bayesian encourages dropout , 2014, ArXiv.
[22] Raman Arora,et al. On Dropout and Nuclear Norm Regularization , 2019, ICML.
[23] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[24] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[25] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[26] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[27] Philip M. Long,et al. On the inductive bias of dropout , 2014, J. Mach. Learn. Res..
[28] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.
[29] Erhardt Barth,et al. Recurrent Dropout without Memory Loss , 2016, COLING.
[30] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[31] Francis R. Bach,et al. Self-concordant analysis for logistic regression , 2009, ArXiv.
[32] Nathan Srebro,et al. Kernel and Deep Regimes in Overparametrized Models , 2019, ArXiv.
[33] J. Zico Kolter,et al. Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience , 2019, ICLR.
[34] Sho Yaida,et al. Fluctuation-dissipation relations for stochastic gradient descent , 2018, ICLR.
[35] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.
[36] Yoshua Bengio,et al. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length , 2018, ICLR.
[37] Sida I. Wang,et al. Altitude Training: Strong Bounds for Single-Layer Dropout , 2014, NIPS.
[38] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[39] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..
[40] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[41] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[42] Roland Memisevic,et al. Regularizing RNNs by Stabilizing Activations , 2015, ICLR.
[43] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[44] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[45] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[46] Raman Arora,et al. On the Implicit Bias of Dropout , 2018, ICML.
[47] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[48] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[49] David J Schwab,et al. How noise affects the Hessian spectrum in overparameterized neural networks , 2019, ArXiv.
[50] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[51] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[52] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[53] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[54] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[55] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.
[56] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[57] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[58] Yoshua Bengio,et al. A Walk with SGD , 2018, ArXiv.
[59] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[60] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[61] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[62] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.
[63] Jian Pei,et al. Demystifying Dropout , 2019, ICML.
[64] Pierre Baldi,et al. Understanding Dropout , 2013, NIPS.
[65] Ann Bies,et al. The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.
[66] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .
[67] Christopher D. Manning,et al. Fast dropout training , 2013, ICML.
[68] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[69] Philip M. Long,et al. Surprising properties of dropout in deep networks , 2017, COLT.
[70] Nathan Srebro,et al. Kernel and Rich Regimes in Overparametrized Models , 2019, COLT.
[71] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.