Exact solutions of a deep linear network
暂无分享,去创建一个
[1] Liu Ziyin,et al. spred: Solving L1 Penalty with SGD , 2022, ICML.
[2] Hidenori Tanaka,et al. What shapes the loss landscape of self-supervised learning? , 2022, ICLR.
[3] Liu Ziyin,et al. Exact Phase Transitions in Deep Learning , 2022, ArXiv.
[4] Akshay Rangamani,et al. Neural Collapse in Deep Homogeneous Classifiers and The Role of Weight Decay , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Zihao Wang,et al. Posterior Collapse of a Linear Latent Variable Model , 2022, NeurIPS.
[6] Eric P. Xing,et al. Stochastic Neural Networks with Infinite Width are Deterministic , 2022, ArXiv.
[7] Andrej Risteski,et al. Variational autoencoders in the presence of low-dimensional data: landscape and implicit bias , 2021, ICLR.
[8] James B. Simon,et al. SGD with a Constant Large Learning Rate Can Converge to Local Maxima , 2021, 2107.11774.
[9] Takashi Mori,et al. Power-Law Escape Rate of SGD , 2021, ICML.
[10] Bo Liu,et al. Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations , 2021, IEEE transactions on neural networks and learning systems.
[11] Liu Ziyin,et al. Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent , 2020, ICML.
[12] Daniel L. K. Yamins,et al. Pruning neural networks without any data by iteratively conserving synaptic flow , 2020, NeurIPS.
[13] Dacheng Tao,et al. Piecewise linear activations substantially shape the loss surfaces of neural networks , 2020, ICLR.
[14] Nathan Srebro,et al. Dropout: Explicit Forms and Capacity Control , 2020, ICML.
[15] Mohammad Norouzi,et al. Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse , 2019, NeurIPS.
[16] Joan Bruna,et al. Spurious Valleys in One-hidden-layer Neural Network Optimization Landscapes , 2019, J. Mach. Learn. Res..
[17] Raman Arora,et al. On Dropout and Nuclear Norm Regularization , 2019, ICML.
[18] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[19] D. Wipf,et al. Diagnosing and Enhancing VAE Models , 2019, ICLR.
[20] Tingting Tang,et al. The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[21] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[22] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[23] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[24] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[25] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[26] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[27] Alexander A. Alemi,et al. Fixing a Broken ELBO , 2017, ICML.
[28] B. Haeffele,et al. Dropout as a Low-Rank Regularizer for Matrix Factorization , 2017, AISTATS.
[29] Amirhossein Taghvaei,et al. How regularization affects the critical points in linear networks , 2017, NIPS.
[30] Kenji Kawaguchi,et al. Depth Creates No Bad Local Minima , 2017, arXiv.org.
[31] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[32] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[33] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[34] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[35] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[36] Aaron C. Courville,et al. Generative Adversarial Nets , 2014, NIPS.
[37] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[38] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[39] Eugene Wong,et al. Stochastic neural networks , 2009, Algorithmica.
[40] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[41] E. Fama. EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY AND EMPIRICAL WORK* , 1970 .
[42] Zihao Wang,et al. Sparsity by Redundancy: Solving L1 with a Simple Reparametrization , 2022, ArXiv.
[43] Yuandong Tian. Deep Contrastive Learning is Provably (almost) Principal Component Analysis , 2022, ArXiv.
[44] James B. Simon,et al. SGD Can Converge to Local Maxima , 2022, ICLR.
[45] Liu Ziyin,et al. SGD May Never Escape Saddle Points , 2021, ArXiv.
[46] Mengjia Xu,et al. Dynamics and Neural Collapse in Deep Classifiers trained with the Square Loss , 2021 .
[47] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[48] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[49] Peter Cheeseman,et al. Bayesian Methods for Adaptive Models , 2011 .
[50] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.