暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[2] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[3] Surya Ganguli,et al. An analytic theory of generalization dynamics and transfer learning in deep linear networks , 2018, ICLR.
[4] S. Smale. On the differential equations of species in competition , 1976, Journal of mathematical biology.
[5] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[6] James A. Storer,et al. RePr: Improved Training of Convolutional Filters , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Raef Bassily,et al. On exponential convergence of SGD in non-convex over-parametrized learning , 2018, ArXiv.
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Florent Krzakala,et al. Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup , 2019, NeurIPS.
[10] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[11] Yuandong Tian,et al. Luck Matters: Understanding Training Dynamics of Deep ReLU Networks , 2019, ArXiv.
[12] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[13] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[14] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.
[15] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[16] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[17] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[18] G. Yin,et al. On competitive Lotka-Volterra model in random environments , 2009 .
[19] Quanquan Gu,et al. An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.
[20] Anthony C. C. Coolen,et al. Statistical mechanical analysis of the dynamics of learning in perceptrons , 1997, Stat. Comput..
[21] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[22] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[23] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[24] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[25] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.
[26] David Saad,et al. Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks , 1995, NIPS.
[27] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[28] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[29] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.
[30] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[31] David Saad,et al. Online Learning in Radial Basis Function Networks , 1997, Neural Computation.
[32] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.
[33] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[34] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[35] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[36] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[37] Suvrit Sra,et al. Global optimality conditions for deep neural networks , 2017, ICLR.
[38] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[39] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[40] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[41] Y. Hosono,et al. The minimal speed of traveling fronts for a diffusive Lotka-Volterra competition model , 1998 .
[42] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.
[43] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[44] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.
[45] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[46] Gintare Karolina Dziugaite,et al. The Lottery Ticket Hypothesis at Scale , 2019, ArXiv.
[47] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[48] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[49] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.
[50] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[51] Yann Ollivier,et al. Natural Langevin Dynamics for Neural Networks , 2017, GSI.
[52] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[53] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[54] Sanjeev Arora,et al. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets , 2019, NeurIPS.
[55] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[56] Nicolas Macris,et al. The committee machine: computational to statistical gaps in learning a two-layers neural network , 2018, NeurIPS.
[57] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[58] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[59] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[60] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[61] Mikhail Belkin,et al. MaSS: an Accelerated Stochastic Method for Over-parametrized Learning , 2018, ArXiv.
[62] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[63] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[64] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[65] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[66] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[67] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[68] Rui Peng,et al. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.
[69] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[70] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[71] Hod Lipson,et al. Convergent Learning: Do different neural networks learn the same representations? , 2015, FE@NIPS.
[72] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[73] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.