暂无分享,去创建一个
[1] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[2] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[3] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[4] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[5] D. Jupp. Approximation to Data by Splines with Free Knots , 1978 .
[6] Yuan Yang,et al. Influence Area of Overlap Singularity in Multilayer Perceptrons , 2018, IEEE Access.
[7] Shun-ichi Amari,et al. Dynamics of Learning in Multilayer Perceptrons Near Singularities , 2008, IEEE Transactions on Neural Networks.
[8] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[9] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[10] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[11] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[12] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[13] J. Aubin,et al. Differential inclusions set-valued maps and viability theory , 1984 .
[14] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[15] Justin A. Sirignano,et al. Mean field analysis of neural networks: A central limit theorem , 2018, Stochastic Processes and their Applications.
[16] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[17] Aleksej F. Filippov,et al. Differential Equations with Discontinuous Righthand Sides , 1988, Mathematics and Its Applications.
[18] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[19] Gene H. Golub,et al. Matrix computations , 1983 .
[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[21] Mauro Perego,et al. Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint , 2019, MSML.
[22] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.
[23] Juncai He sci. Relu Deep Neural Networks and Linear Finite Elements , 2020 .
[24] Tsuyoshi Murata,et al. {m , 1934, ACML.
[25] Yuki Yoshida,et al. Data-dependence of plateau phenomenon in learning with neural network—statistical mechanical analysis , 2020, NeurIPS.
[26] Andrea Montanari,et al. Limitations of Lazy Training of Two-layers Neural Networks , 2019, NeurIPS.
[27] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[28] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.
[29] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[30] H. N. Mhaskar,et al. Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.
[31] G. Teschl. Ordinary Differential Equations and Dynamical Systems , 2012 .
[32] P. Alam. ‘L’ , 2021, Composites Engineering: An A–Z Guide.
[33] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[34] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[35] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[36] Joan Bruna,et al. Gradient Dynamics of Shallow Univariate ReLU Networks , 2019, NeurIPS.
[37] Shun-ichi Amari,et al. Dynamics of Learning Near Singularities in Layered Networks , 2008, Neural Computation.
[38] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[39] Kenji Fukumizu,et al. Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.
[40] Kenji Fukumizu,et al. Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.
[41] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[42] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.