暂无分享,去创建一个
[1] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[2] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[3] Alexander J. Smola,et al. Learning with kernels , 1998 .
[4] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[5] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.
[6] Alan J. Lee,et al. U-Statistics: Theory and Practice , 1990 .
[7] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.
[8] Philip M. Long,et al. On the inductive bias of dropout , 2014, J. Mach. Learn. Res..
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] A Tikhonov,et al. Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .
[11] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[12] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[13] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[14] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[15] G. Wahba. Splines in Nonparametric Regression , 2006 .
[16] Pierre Baldi,et al. Understanding Dropout , 2013, NIPS.
[17] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.
[18] Eduardo Sontag. VC dimension of neural networks , 1998 .
[19] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[20] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Stephen A. Cook,et al. The complexity of theorem-proving procedures , 1971, STOC.
[22] eon BottouAT. Stochastic Gradient Learning in Neural Networks , 2022 .
[23] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[24] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[25] T. Poggio,et al. Networks and the best approximation property , 1990, Biological Cybernetics.
[26] Sebastian Nowozin,et al. On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[27] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[28] Franco Scarselli,et al. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[29] S. Mendelson,et al. Regularization in kernel learning , 2010, 1001.2094.
[30] Harris Drucker,et al. Comparison of learning algorithms for handwritten digit recognition , 1995 .
[31] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[32] Janos Galambos,et al. Advanced probability theory , 1988 .
[33] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[34] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[35] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[36] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[37] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[38] Kenji Fukumizu,et al. Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..
[39] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[40] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[41] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[42] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.
[43] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[44] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] G. Casella,et al. Explaining the Gibbs Sampler , 1992 .
[46] Radford M. Neal. Slice Sampling , 2003, The Annals of Statistics.
[47] Andrew Zisserman,et al. A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[48] Richard Hans Robert Hahnloser,et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.
[49] Marek Karpinski,et al. Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks , 1997, J. Comput. Syst. Sci..
[50] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[51] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[52] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[53] Colas Schretter,et al. Monte Carlo and Quasi-Monte Carlo Methods , 2016 .
[54] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[55] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[56] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.
[57] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.