暂无分享,去创建一个
[1] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[2] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[3] Kenji Fukumizu,et al. Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.
[4] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[5] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[6] J. Blanchet,et al. Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization , 2016 .
[7] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[8] Georgios Piliouras,et al. Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions , 2016, ITCS.
[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[10] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.
[11] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[12] Nicholas I. M. Gould,et al. How much patience do you have? A worst-case perspective on smooth nonconvex optimization , 2012 .
[13] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[14] Mohammad Bagher Menhaj,et al. Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.
[15] Satoshi Matsuoka,et al. Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs , 2018, ArXiv.
[16] Xiaoxia Wu,et al. L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .
[17] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[18] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[19] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[20] O. Chapelle. Improved Preconditioner for Hessian Free Optimization , 2011 .
[21] Stuart E. Dreyfus,et al. Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature , 2008, Neural Networks.
[22] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[23] Yair Carmon,et al. "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.
[24] Thomas Hofmann,et al. Local Saddle Point Optimization: A Curvature Exploitation Approach , 2018, AISTATS.
[25] Gabriel Goh,et al. Why Momentum Really Works , 2017 .
[26] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[27] Ohad Shamir,et al. Failures of Gradient-Based Deep Learning , 2017, ICML.
[28] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.
[31] Yurii Nesterov,et al. Accelerating the cubic regularization of Newton’s method on convex problems , 2005, Math. Program..
[32] Nicholas I. M. Gould,et al. Trust Region Methods , 2000, MOS-SIAM Series on Optimization.
[33] A. Conv. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016 .
[34] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[35] Thomas Hofmann,et al. A Distributed Second-Order Algorithm You Can Trust , 2018, ICML.
[36] Dit-Yan Yeung,et al. Collaborative Deep Learning for Recommender Systems , 2014, KDD.
[37] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[38] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[39] Katya Scheinberg,et al. Stochastic optimization using a trust-region method and random models , 2015, Mathematical Programming.
[40] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[41] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[42] Yi Zhang,et al. The Case for Full-Matrix Adaptive Regularization , 2018, ArXiv.
[43] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.
[44] Gerd Hirzinger,et al. Solving the Ill-Conditioning in Neural Network Learning , 1996, Neural Networks: Tricks of the Trade.
[45] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[46] Cho-Jui Hsieh,et al. Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient , 2018, ArXiv.
[47] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[48] Aurélien Lucchi,et al. Sub-sampled Cubic Regularization for Non-convex Optimization , 2017, ICML.
[49] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[50] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[51] E. Wigner. Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .
[52] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[53] Daniel P. Robinson,et al. How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization , 2018 .
[54] T. Steihaug. The Conjugate Gradient Method and Trust Regions in Large Scale Optimization , 1983 .
[55] H. Robbins. A Stochastic Approximation Method , 1951 .
[56] Peng Xu,et al. Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..
[57] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[58] Ya-Xiang Yuan,et al. Recent advances in trust region algorithms , 2015, Mathematical Programming.
[59] Katya Scheinberg,et al. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models , 2015, Mathematical Programming.
[60] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[61] L. N. Vicente,et al. Complexity and global rates of trust-region methods based on probabilistic models , 2018 .
[62] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..
[63] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[64] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[65] Y. Saad,et al. An estimator for the diagonal of a matrix , 2007 .
[66] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[67] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[68] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[69] Nicholas I. M. Gould,et al. Complexity bounds for second-order optimality in unconstrained optimization , 2012, J. Complex..
[70] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[71] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[72] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[73] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[74] Peng Xu,et al. Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study , 2017, SDM.
[75] Peng Xu,et al. Inexact Non-Convex Newton-Type Methods , 2018, 1802.06925.
[76] Daniel P. Robinson,et al. Exploiting negative curvature in deterministic and stochastic optimization , 2017, Mathematical Programming.
[77] Nicolas Le Roux,et al. Negative eigenvalues of the Hessian in deep neural networks , 2018, ICLR.