Function Norms and Regularization in Deep Networks
暂无分享,去创建一个
[1] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[2] A Tikhonov,et al. Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .
[3] Grace Wahba,et al. Spline Models for Observational Data , 1990 .
[4] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[5] Alexander J. Smola,et al. Learning with kernels , 1998 .
[6] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.
[7] Pierre Baldi,et al. Understanding Dropout , 2013, NIPS.
[8] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[9] S. Mendelson,et al. Regularization in kernel learning , 2010, 1001.2094.
[10] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[11] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[12] Stephen A. Cook,et al. The complexity of theorem-proving procedures , 1971, STOC.
[13] Alan J. Lee,et al. U-Statistics: Theory and Practice , 1990 .
[14] Philip M. Long,et al. On the inductive bias of dropout , 2014, J. Mach. Learn. Res..
[15] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[16] Harris Drucker,et al. Comparison of learning algorithms for handwritten digit recognition , 1995 .
[17] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[18] Richard Hans Robert Hahnloser,et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.
[19] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[20] Sebastian Nowozin,et al. On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[21] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[22] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[23] G. Wahba. Splines in Nonparametric Regression , 2006 .
[24] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.
[26] T. Poggio,et al. Networks and the best approximation property , 1990, Biological Cybernetics.
[27] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[28] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[29] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Andrew Zisserman,et al. A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[31] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[32] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[33] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[34] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[36] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.