暂无分享,去创建一个
Massimo Fornasier | Timo Klock | Christian Fiedler | Michael Rauchensteiner | M. Fornasier | T. Klock | Michael Rauchensteiner | Christian Fiedler
[1] Verner Vlavci'c,et al. Affine symmetries and neural network identifiability , 2021 .
[2] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[4] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[5] Levent Tunçel,et al. Optimization algorithms on matrix manifolds , 2009, Math. Comput..
[6] Yi Ma,et al. Robust principal component analysis? , 2009, JACM.
[7] Jan Vybíral,et al. Learning Functions of Few Arbitrary Linear Parameters in High Dimensions , 2010, Found. Comput. Math..
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[10] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[11] G. Petrova,et al. Nonlinear Approximation and (Deep) ReLU\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {ReLU}$$\end{document} , 2019, Constructive Approximation.
[12] Suvrit Sra,et al. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity , 2018, NeurIPS.
[13] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.
[14] Joao M. Pereira,et al. Subspace power method for symmetric tensor decomposition and generalized PCA , 2019, ArXiv.
[15] Verner Vlavci'c,et al. Neural Network Identifiability for a Family of Sigmoidal Nonlinearities , 2019, Constructive Approximation.
[16] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[17] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[18] Helmut Bölcskei,et al. Deep Neural Network Approximation Theory , 2019, IEEE Transactions on Information Theory.
[19] Helmut Bölcskei,et al. Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..
[20] Alexander Cloninger,et al. Provable approximation properties for deep neural networks , 2015, ArXiv.
[21] Roman Vershynin,et al. Memory Capacity of Neural Networks with Threshold and Rectified Linear Unit Activations , 2020, SIAM J. Math. Data Sci..
[22] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[23] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[24] Alex Gittens,et al. TAIL BOUNDS FOR ALL EIGENVALUES OF A SUM OF RANDOM MATRICES , 2011, 1104.4513.
[25] Marco Mondelli,et al. Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks , 2020, ICML.
[26] Massimo Fornasier,et al. Robust and Resource-Efficient Identification of Two Hidden Layer Neural Networks , 2019, Constructive Approximation.
[27] Nathan Srebro,et al. Kernel and Rich Regimes in Overparametrized Models , 2019, COLT.
[28] J. Stephen Judd,et al. On the complexity of loading shallow neural networks , 1988, J. Complex..
[29] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[30] Adam R. Klivans,et al. Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection , 2020, ICML.
[31] H. Rauhut,et al. Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers , 2019, Information and Inference: A Journal of the IMA.
[32] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[33] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[34] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[35] Marco Mondelli,et al. Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology , 2020, NeurIPS.
[36] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[37] Geoffrey E. Hinton,et al. Learning Representations by Recirculation , 1987, NIPS.
[38] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[39] G. Stewart. Perturbation theory for the singular value decomposition , 1990 .
[40] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[41] Andrea Montanari,et al. On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition , 2018, AISTATS.
[42] Jan Vybíral,et al. Identification of Shallow Neural Networks by Fewest Samples , 2018, Information and Inference: A Journal of the IMA.
[43] Nathan Srebro,et al. Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy , 2020, NeurIPS.
[44] Boris Hanin,et al. Neural network approximation , 2020, Acta Numerica.
[45] A. Pinkus,et al. Identifying Linear Combinations of Ridge Functions , 1999 .
[46] Ruoyu Sun,et al. Optimization for deep learning: theory and algorithms , 2019, ArXiv.
[47] C. Chui,et al. Approximation by ridge functions and neural networks with one hidden layer , 1992 .
[48] Constantine Caramanis,et al. Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.
[49] Eduardo D. Sontag,et al. UNIQUENESS OF WEIGHTS FOR NEURAL NETWORKS , 1993 .
[50] C. Fefferman. Reconstructing a neural net from its output , 1994 .
[51] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[52] Héctor J. Sussmann,et al. Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.
[53] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[54] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[55] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[56] Alexander Cloninger,et al. ReLU nets adapt to intrinsic dimensionality beyond the target domain , 2020, ArXiv.
[57] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[58] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[59] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[60] Yann LeCun,et al. Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .
[61] Arnulf Jentzen,et al. Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations , 2018, SIAM J. Math. Data Sci..
[62] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[63] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[64] H. N. Mhaskar,et al. Function approximation by deep networks , 2019, ArXiv.
[65] J. Knott. The organization of behavior: A neuropsychological theory , 1951 .
[66] P. Wedin. Perturbation bounds in connection with singular value decomposition , 1972 .
[67] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[68] David Rolnick,et al. Reverse-engineering deep ReLU networks , 2019, ICML.
[69] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[70] Prateek Jain,et al. Non-convex Robust PCA , 2014, NIPS.
[71] Anima Anandkumar,et al. Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.
[72] Guang-Bin Huang,et al. Learning capability and storage capacity of two-hidden-layer feedforward networks , 2003, IEEE Trans. Neural Networks.
[73] Arnulf Jentzen,et al. DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing , 2018, Constructive Approximation.