暂无分享,去创建一个
[1] Stephen A. Cook,et al. The complexity of theorem-proving procedures , 1971, STOC.
[2] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[3] Alan J. Lee,et al. U-Statistics: Theory and Practice , 1990 .
[4] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[5] Harris Drucker,et al. Comparison of learning algorithms for handwritten digit recognition , 1995 .
[6] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.
[7] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[8] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[9] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[10] Alexander J. Smola,et al. Learning with kernels , 1998 .
[11] Richard Hans Robert Hahnloser,et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.
[12] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[13] Andrew Zisserman,et al. A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[14] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[15] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[16] Sebastian Nowozin,et al. On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[17] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[18] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[19] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[20] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[21] Torsten Rohlfing,et al. Image Similarity and Tissue Overlaps as Surrogates for Image Registration Accuracy: Widely Used but Unreliable , 2012, IEEE Transactions on Medical Imaging.
[22] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[23] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[24] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[25] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[28] Iasonas Kokkinos,et al. Sub-cortical brain structure segmentation using F-CNN'S , 2016, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).
[29] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[30] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[31] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[32] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[33] eon BottouAT. Stochastic Gradient Learning in Neural Networks , 2022 .