Student Specialization in Deep ReLU Networks With Finite Width and Input Dimension
暂无分享,去创建一个
[1] Anthony C. C. Coolen,et al. Statistical mechanical analysis of the dynamics of learning in perceptrons , 1997, Stat. Comput..
[2] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[5] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[6] Mikhail Belkin,et al. MaSS: an Accelerated Stochastic Method for Over-parametrized Learning , 2018, ArXiv.
[7] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[8] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[9] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[10] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[11] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[12] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[13] Guang Cheng,et al. Optimal Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting , 2020, ArXiv.
[14] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[15] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[16] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[17] Sanjeev Arora,et al. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets , 2019, NeurIPS.
[18] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.
[19] E. Gardner,et al. Three unfinished works on the optimal storage capacity of networks , 1989 .
[20] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[21] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[22] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[23] David Rolnick,et al. Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.
[24] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[25] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[26] Y. Hosono,et al. The minimal speed of traveling fronts for a diffusive Lotka-Volterra competition model , 1998 .
[27] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.
[28] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[29] Surya Ganguli,et al. An analytic theory of generalization dynamics and transfer learning in deep linear networks , 2018, ICLR.
[30] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.
[31] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[32] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[33] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[34] David Rolnick,et al. Identifying Weights and Architectures of Unknown ReLU Networks , 2019, ArXiv.
[35] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[36] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[37] David Saad,et al. Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks , 1995, NIPS.
[38] Rui Peng,et al. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.
[39] Mikhail Belkin,et al. Accelerating SGD with momentum for over-parameterized learning , 2018, ICLR.
[40] David Rolnick,et al. Complexity of Linear Regions in Deep Networks , 2019, ICML.
[41] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.
[42] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[43] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[44] Yann Ollivier,et al. Natural Langevin Dynamics for Neural Networks , 2017, GSI.
[45] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[46] Suvrit Sra,et al. Global optimality conditions for deep neural networks , 2017, ICLR.
[47] Lei Wu,et al. How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective , 2018, NeurIPS.
[48] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.
[49] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[50] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[51] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[52] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[53] Hod Lipson,et al. Convergent Learning: Do different neural networks learn the same representations? , 2015, FE@NIPS.
[54] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[55] S. Smale. On the differential equations of species in competition , 1976, Journal of mathematical biology.
[56] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.
[57] Raef Bassily,et al. On exponential convergence of SGD in non-convex over-parametrized learning , 2018, ArXiv.
[58] Yuandong Tian,et al. Luck Matters: Understanding Training Dynamics of Deep ReLU Networks , 2019, ArXiv.
[59] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[60] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[61] Gintare Karolina Dziugaite,et al. Stabilizing the Lottery Ticket Hypothesis , 2019 .
[62] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[63] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.
[64] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[65] David Saad,et al. Online Learning in Radial Basis Function Networks , 1997, Neural Computation.
[66] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.
[67] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[68] David Rolnick,et al. Reverse-engineering deep ReLU networks , 2019, ICML.
[69] G. Yin,et al. On competitive Lotka-Volterra model in random environments , 2009 .
[70] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[71] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.
[72] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[73] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[74] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[75] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[76] Nicolas Macris,et al. The committee machine: computational to statistical gaps in learning a two-layers neural network , 2018, NeurIPS.
[77] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.