暂无分享,去创建一个
Yoshua Bengio | Min Lin | Aaron C. Courville | Fred A. Hamprecht | Devansh Arpit | Nasim Rahaman | Aristide Baratin | Felix Dräxler | Yoshua Bengio | Min Lin | Devansh Arpit | A. Baratin | F. Hamprecht | Nasim Rahaman | Felix Dräxler
[1] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[2] G. Lugosi,et al. Consistency of the k-Nearest Neighbor Rule , 1996 .
[3] Haw-minn Lu. Geometric Properties of Image Manifolds , 1996 .
[4] H. Jónsson,et al. Nudged elastic band method for finding minimum energy paths of transitions , 1998 .
[5] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[6] G. Lewicki,et al. Approximation by Superpositions of a Sigmoidal Function , 2003 .
[7] Barbara Hammer,et al. A Note on the Universal Approximation Capability of Support Vector Machines , 2003, Neural Processing Letters.
[8] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[10] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[11] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[12] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[13] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[14] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[15] E. L. Kolsbjerg,et al. An automated nudged elastic band method. , 2016, The Journal of chemical physics.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[18] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[19] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[20] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[21] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[22] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[23] Quoc V. Le,et al. Understanding Generalization and Stochastic Gradient Descent , 2017 .
[24] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[25] Mikhail Belkin,et al. Diving into the shallows: a computational perspective on large-scale shallow learning , 2017, NIPS.
[26] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[27] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[28] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[29] Yoshua Bengio,et al. A Walk with SGD , 2018, ArXiv.
[30] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[31] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[32] Michael Unser,et al. A representer theorem for deep neural networks , 2018, J. Mach. Learn. Res..
[33] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.