Training Deep Neural Networks
暂无分享,去创建一个
[1] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[2] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[3] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[4] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .
[5] D. E. Rumelhart,et al. Learning internal representations by back-propagating errors , 1986 .
[6] Hao Yu,et al. Levenberg—Marquardt Training , 2011 .
[7] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[8] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[9] David G. Luenberger,et al. Linear and nonlinear programming , 1984 .
[10] Rich Caruana,et al. Model compression , 2006, KDD '06.
[11] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[12] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.
[13] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[14] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[15] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[16] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[17] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[18] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .
[19] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[20] Razvan Pascanu,et al. Understanding the exploding gradient problem , 2012, ArXiv.
[21] Tim Dettmers,et al. 8-Bit Approximations for Parallelism in Deep Learning , 2015, ICLR.
[22] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[23] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[24] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[25] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[26] H. Sebastian Seung,et al. Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks , 2003, Neural Computation.
[27] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[28] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[29] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[30] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[31] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[32] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.
[33] Paul J. Werbos,et al. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .
[34] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[35] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.
[36] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[37] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[40] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[41] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[42] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[43] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[44] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[45] Mokhtar S. Bazaraa,et al. Nonlinear Programming: Theory and Algorithms , 1993 .
[46] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[47] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[48] Henry J. Kelley,et al. Gradient Theory of Optimal Flight Paths , 1960 .
[49] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .
[50] Kevin Leyton-Brown,et al. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.
[51] E. Polak,et al. Computational methods in optimization : a unified approach , 1972 .
[52] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[53] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[54] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .
[55] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[56] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[57] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .
[59] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[60] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[61] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[62] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[63] David D. Cox,et al. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.
[64] Hermann Ney,et al. A Convergence Analysis of Log-Linear Training , 2011, NIPS.