Path-SGD: Path-Normalized Optimization in Deep Neural Networks
暂无分享,去创建一个
Ruslan Salakhutdinov | Nathan Srebro | Behnam Neyshabur | R. Salakhutdinov | Nathan Srebro | Behnam Neyshabur | N. Srebro
[1] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[2] Adi Shraibman,et al. Rank, Trace-Norm and Max-Norm , 2005, COLT.
[3] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[5] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[6] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[7] Ambuj Tewari,et al. On the Universality of Online Mirror Descent , 2011, NIPS.
[8] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[9] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[10] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[11] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[12] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[13] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.