Learning Neural Networks with Adaptive Regularization
暂无分享,去创建一个
Han Zhao | Ruslan Salakhutdinov | Geoffrey J. Gordon | Yao-Hung Hubert Tsai | R. Salakhutdinov | Han Zhao
[1] H. Robbins. An Empirical Bayes Approach to Statistics , 1956 .
[2] B. Efron,et al. Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .
[3] B. Efron,et al. Stein's Paradox in Statistics , 1977 .
[4] J. Zidek,et al. Adaptive Multivariate Ridge Regression , 1980 .
[5] S. Oman. A Different Empirical Bayes Interpretation of Ridge and Stein Estimators , 1984 .
[6] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[7] Sanjay Mehrotra,et al. On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..
[8] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[9] C. Stein,et al. Estimation with Quadratic Loss , 1992 .
[10] A. Rukhin. Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.
[11] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.
[12] Stefan Schaal,et al. Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.
[13] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[14] Gene H. Golub,et al. Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.
[15] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[16] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[17] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[18] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.
[19] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[20] Christian P. Robert,et al. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction , 2012 .
[21] Isaac Dialsingh,et al. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .
[22] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[23] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[24] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.
[25] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[26] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[27] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[28] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[29] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Max Welling,et al. Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.
[31] Ross B. Girshick,et al. Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.
[32] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[33] Philip S. Yu,et al. Learning Multiple Tasks with Multilinear Relationship Networks , 2015, NIPS.
[34] Lawrence Carin,et al. Learning Structured Weight Uncertainty in Bayesian Neural Networks , 2017, AISTATS.
[35] Yoram Singer,et al. A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization , 2017, ArXiv.
[36] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[37] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[38] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.
[39] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[40] Guodong Zhang,et al. Noisy Natural Gradient as Variational Inference , 2017, ICML.
[41] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[42] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.
[43] Han Zhao,et al. Efficient Multitask Feature and Relationship Learning , 2017, UAI.