暂无分享,去创建一个
Jorge Nocedal | Mikhail Smelyanskiy | Ping Tak Peter Tang | Dheevatsa Mudigere | Nitish Shirish Keskar | J. Nocedal | N. Keskar | D. Mudigere | M. Smelyanskiy | P. T. P. Tang | Dheevatsa Mudigere
[1] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .
[2] Michael C. Ferris,et al. Weak sharp minima and penalty functions in mathematical programming , 1988 .
[3] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[4] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[5] Jorge Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..
[6] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[8] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[9] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[10] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[12] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[13] Dimitris Bertsimas,et al. Robust Optimization for Unconstrained Simulation-Based Problems , 2010, Oper. Res..
[14] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[15] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[16] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[17] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[18] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[19] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[20] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[21] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .
[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[23] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[24] Mark W. Schmidt,et al. Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..
[25] Jorge Nocedal,et al. Sample size selection in optimization methods for machine learning , 2012, Math. Program..
[26] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[27] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[28] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[29] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[30] Jean-Philippe Vial,et al. Robust Optimization , 2021, ICORES.
[31] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[32] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[33] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[36] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[37] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[38] Uri Shaham,et al. Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.
[39] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[40] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] John Wright,et al. When Are Nonconvex Problems Not Scary? , 2015, ArXiv.
[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[43] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[44] Hossein Mobahi,et al. Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.
[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Albert S. Berahas,et al. adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs , 2015, ECML/PKDD.
[47] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[48] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[49] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Pradeep Dubey,et al. Distributed Deep Learning Using Synchronous Stochastic Gradient Descent , 2016, ArXiv.
[51] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[52] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .
[53] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..