Weight Expansion: A New Perspective on Dropout and Generalization
暂无分享,去创建一个
[1] John Shawe-Taylor,et al. PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses , 2020, Entropy.
[2] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[3] Behnam Neyshabur,et al. The intriguing role of module criticality in the generalization of deep networks , 2020, ICLR.
[4] Vincent Gripon,et al. Ranking Deep Learning Generalization using Label Variation in Latent Geometry Graphs , 2020, ArXiv.
[5] Colin Wei,et al. The Implicit and Explicit Regularization Effects of Dropout , 2020, ICML.
[6] Shiliang Sun,et al. PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..
[7] Zhangyang Wang,et al. Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , 2018, NeurIPS.
[8] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[9] Padhraic Smyth,et al. Dropout as a Structured Shrinkage Prior , 2018, ICML.
[10] Yuichi Yoshida,et al. Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.
[11] Raman Arora,et al. On Convergence and Generalization of Dropout Training , 2020, NeurIPS.
[12] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[13] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[14] Gaurav Malhotra,et al. The role of Disentanglement in Generalisation , 2021, ICLR.
[15] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[16] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[17] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[18] Eric P. Xing,et al. On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks , 2020, ArXiv.
[19] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[20] David Barber,et al. Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.
[21] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[22] Yoshua Bengio,et al. A Walk with SGD , 2018, ArXiv.
[23] Liang Zhang,et al. How does Weight Correlation Affect the Generalisation Ability of Deep Neural Networks , 2020, ArXiv.
[24] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[25] Eunho Yang,et al. Meta Dropout: Learning to Perturb Latent Features for Generalization , 2020, ICLR.
[26] Yoshua Bengio,et al. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length , 2018, ICLR.
[27] Stefano Soatto,et al. Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Xiaoning Qian,et al. Contextual Dropout: An Efficient Sample-Dependent Dropout Module , 2021, ICLR.
[29] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[30] Philip M. Long,et al. Surprising properties of dropout in deep networks , 2017, COLT.
[31] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[32] Gintare Karolina Dziugaite,et al. Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy , 2017, NeurIPS.
[33] Christian Igel,et al. A Strongly Quasiconvex PAC-Bayesian Bound , 2016, ALT.
[34] Andreas Maurer,et al. A Note on the PAC Bayesian Theorem , 2004, ArXiv.
[35] Daniel M. Roy,et al. Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms , 2020, NeurIPS.
[36] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[37] Xuanjing Huang,et al. Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.
[38] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.
[39] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[40] Sen Wu,et al. On the Generalization Effects of Linear Transformations in Data Augmentation , 2020, ICML.
[41] Csaba Szepesvari,et al. Tighter risk certificates for neural networks , 2020, J. Mach. Learn. Res..
[42] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[43] François Laviolette,et al. PAC-Bayesian learning of linear classifiers , 2009, ICML '09.
[44] Ankit Singh Rawat,et al. Overparameterisation and worst-case generalisation: friend or foe? , 2021, ICLR.
[45] Ilja Kuzborskij,et al. PAC-Bayes Analysis Beyond the Usual Bounds , 2020, NeurIPS.
[46] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[47] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[48] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[49] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[50] Josef Kittler,et al. Learning PAC-Bayes Priors for Probabilistic Neural Networks , 2021, ArXiv.
[51] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.
[52] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[53] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[54] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[55] Ioannis Mitliagkas,et al. In Search of Robust Measures of Generalization , 2020, NeurIPS.
[56] Manik Sharma,et al. Representation Based Complexity Measures for Predicting Generalization in Deep Learning , 2020, ArXiv.
[57] Raman Arora,et al. On the Implicit Bias of Dropout , 2018, ICML.
[58] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[59] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[60] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[61] Philip M. Long,et al. On the inductive bias of dropout , 2014, J. Mach. Learn. Res..
[62] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[63] Raman Arora,et al. On Dropout and Nuclear Norm Regularization , 2019, ICML.
[64] Hossein Mobahi,et al. NeurIPS 2020 Competition: Predicting Generalization in Deep Learning , 2020, ArXiv.
[65] John Langford,et al. (Not) Bounding the True Error , 2001, NIPS.
[66] J. Zico Kolter,et al. Generalization in Deep Networks: The Role of Distance from Initialization , 2019, ArXiv.
[67] Adrian J. Shepherd,et al. Second-order methods for neural networks - fast and reliable training methods for multi-layer perceptrons , 1997, Perspectives in neural computing.
[68] Aram Galstyan,et al. Improving Generalization by Controlling Label-Noise Information in Neural Network Weights , 2020, ICML.
[69] Toniann Pitassi,et al. Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.
[70] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[71] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[72] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[73] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[74] David Barber,et al. A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.
[75] René Vidal,et al. Dropout as a Low-Rank Regularizer for Matrix Factorization , 2017, AISTATS.
[76] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[77] Pascal Germain,et al. Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks , 2019, NeurIPS.
[78] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[79] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[80] Quoc V. Le,et al. AutoDropout: Learning Dropout Patterns to Regularize Deep Networks , 2021, AAAI.
[81] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..
[82] Roberto Battiti,et al. First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.
[83] Pierre Alquier,et al. On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..
[84] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[85] David Tse,et al. Generalizable Adversarial Training via Spectral Normalization , 2018, ICLR.
[86] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.
[87] Toniann Pitassi,et al. Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.
[88] Chong You,et al. Rethinking Bias-Variance Trade-off for Generalization of Neural Networks , 2020, ICML.
[89] Pierre Vandergheynst,et al. PAC-BAYESIAN MARGIN BOUNDS FOR CONVOLUTIONAL NEURAL NETWORKS , 2018 .