How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer
暂无分享,去创建一个
[1] J. Teichmann,et al. Optimal Stopping via Randomized Neural Networks , 2021, Frontiers of Mathematical Finance.
[2] Albert Y. Zomaya,et al. Partial Differential Equations , 2007, Explorations in Numerical Analysis.
[3] Arthur Jacot,et al. Implicit Regularization of Random Feature Models , 2020, ICML.
[4] Nathan Srebro,et al. A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case , 2019, ICLR.
[5] Joan Bruna,et al. Gradient Dynamics of Shallow Univariate ReLU Networks , 2019, NeurIPS.
[6] Francis Bach,et al. Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks , 2019, NeurIPS.
[7] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.
[8] J. Zico Kolter,et al. A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.
[9] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[10] T. Poggio,et al. Theory IIIb: Generalization in Deep Networks , 2018, ArXiv.
[11] Arthur Jacot,et al. Neural Tangent Kernel: Convergence and Generalization in Neural Networks , 2018, NeurIPS.
[12] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[13] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[14] Sylvain Gelly,et al. Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.
[15] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[16] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[17] Alexander Cloninger,et al. Provable approximation properties for deep neural networks , 2015, ArXiv.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[20] M. Bianchini,et al. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[21] Attila Gilányi,et al. An Introduction to the Theory of Functional Equations and Inequalities , 2008 .
[22] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.
[23] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[24] C. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[25] R. Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[26] A. Pinkus,et al. Original Contribution: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function , 1993 .
[27] Yoshifusa Ito. Approximation of functions on a compact set by finite sums of a sigmoid function without scaling , 1991, Neural Networks.
[28] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[29] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.
[30] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[31] Peter Craven,et al. Smoothing noisy data with spline functions , 1978 .
[32] G. Wahba. Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression , 1978 .
[33] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .
[34] C. Reinsch. Smoothing by spline functions , 1967 .
[35] J. Teichmann,et al. Infinite wide (finite depth) Neural Networks benefit from multi-task learning unlike shallow Gaussian Processes - an exact quantitative macroscopic characterization , 2021, ArXiv.
[36] Pradeep Ravikumar,et al. Connecting Optimization and Regularization Paths , 2018, NeurIPS.
[37] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[38] Bogdan E. Popescu,et al. Gradient Directed Regularization for Linear Regression and Classi…cation , 2004 .
[39] Christopher M. Bishop,et al. Regularization and complexity control in feed-forward networks , 1995 .
[40] Adi Ben-Israel,et al. Generalized inverses: theory and applications , 1974 .