MetaInit: Initializing learning by learning to initialize
暂无分享,去创建一个
[1] Brian McWilliams,et al. The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.
[2] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[5] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[6] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[7] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[8] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[11] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[13] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[14] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[15] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[16] Mark W. Schmidt,et al. Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.
[17] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[18] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.
[19] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[20] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[21] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[22] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[23] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[24] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[25] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[26] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[27] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.
[28] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[29] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[30] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[31] Kaiming He,et al. Group Normalization , 2018, ECCV.
[32] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[33] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[34] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[35] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[36] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[37] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[38] M. Abràmoff,et al. Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. , 2016, Investigative ophthalmology & visual science.
[39] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[40] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[41] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[44] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[45] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[46] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[47] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs , 2019, ArXiv.
[48] Jaime G. Carbonell,et al. The Nonlinearity Coefficient - Predicting Generalization in Deep Neural Networks , 2018, 1806.00179.
[49] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[50] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[51] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[52] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[53] Jeremy Nixon,et al. Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.
[54] Jascha Sohl-Dickstein,et al. Meta-Learning Update Rules for Unsupervised Representation Learning , 2018, ICLR.
[55] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[56] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[57] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[58] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[59] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[60] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.