A note on Linear Bottleneck networks and their Transition to Multilinearity
暂无分享,去创建一个
[1] Dong Yu,et al. Improved Bottleneck Features Using Pretrained Deep Neural Networks , 2011, INTERSPEECH.
[2] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[3] Quanquan Gu,et al. An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.
[4] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[5] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[6] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Caroline Uhler,et al. On Alignment in Deep Linear Neural Networks , 2020 .
[9] Mikhail Belkin,et al. On the linearity of large non-linear models: when and why the tangent kernel is constant , 2020, NeurIPS.
[10] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[11] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[12] Sri Harish Reddy Mallidi,et al. Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.
[13] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[14] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[15] Jianqing Fan,et al. High-Dimensional Statistics , 2014 .
[16] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[17] Wei Hu,et al. Width Provably Matters in Optimization for Deep Linear Neural Networks , 2019, ICML.
[18] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[19] Philip M. Long,et al. Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks , 2018, Neural Computation.
[20] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[21] Mikhail Belkin,et al. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks , 2020, Applied and Computational Harmonic Analysis.
[22] Prateek Jain,et al. Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.
[23] Nathan Srebro,et al. Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy , 2020, NeurIPS.
[24] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.
[25] Francis Bach,et al. Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks , 2019, NeurIPS.
[26] Guilherme França,et al. Understanding the Dynamics of Gradient Flow in Overparameterized Linear models , 2021, ICML.
[27] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.