论文信息 - Data-Dependent Path Normalization in Neural Networks - 字舞流文

Data-Dependent Path Normalization in Neural Networks

We propose a unified framework for neural net normalization, regularization and optimization, which includes Path-SGD and Batch-Normalization and interpolates between them across two different dimensions. Through this framework we investigate issue of invariance of the optimization, data dependence and the connection with natural gradients.

Ruslan Salakhutdinov | Ryota Tomioka | Nathan Srebro | Behnam Neyshabur | R. Salakhutdinov | Ryota Tomioka | Behnam Neyshabur | N. Srebro

[1] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.

[2] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[3] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[4] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[5] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[6] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[7] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[8] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.

[10] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.

[11] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[12] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[13] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.

[14] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[15] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[16] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[17] Ruslan Salakhutdinov,et al. Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix , 2015, ICML.

[18] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[19] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[20] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.