暂无分享,去创建一个
Kaifeng Lyu | Jian Li | Jian Li | Kaifeng Lyu
[1] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[2] Javier Peña,et al. Towards a deeper geometric, analytic and algorithmic understanding of margins , 2014, Optim. Methods Softw..
[3] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[4] Yuxin Chen,et al. Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..
[5] Nathan Srebro,et al. Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.
[6] Kalyanmoy Deb,et al. Approximate KKT points and a proximity measure for termination , 2013, J. Glob. Optim..
[7] Pradeep Ravikumar,et al. Connecting Optimization and Regularization Paths , 2018, NeurIPS.
[8] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[9] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[10] J. Zico Kolter,et al. A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.
[11] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.
[12] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[13] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.
[14] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[15] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[16] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[17] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[18] Junwei Lu,et al. On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond , 2018, ArXiv.
[19] L. Dries,et al. Geometric categories and o-minimal structures , 1996 .
[20] H. B. Curry. The method of steepest descent for non-linear minimization problems , 1944 .
[21] Bo Li,et al. Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units , 2019, Communications in Computational Physics.
[22] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[23] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[24] Guy Blanc,et al. Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process , 2019, COLT.
[25] Yoav Freund,et al. Boosting: Foundations and Algorithms , 2012 .
[26] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[27] Francis Bach,et al. Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks , 2019, NeurIPS.
[28] Matus Telgarsky,et al. A refined primal-dual analysis of the implicit bias , 2019, ArXiv.
[29] Yu. S. Ledyaev,et al. Nonsmooth analysis and control theory , 1998 .
[30] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[31] Robert E. Mahony,et al. Convergence of the Iterates of Descent Methods for Analytic Cost Functions , 2005, SIAM J. Optim..
[32] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[33] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[34] Moustapha Cissé,et al. Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.
[35] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[36] Yoram Singer,et al. On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.
[37] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[38] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[39] Ji Zhu,et al. Margin Maximizing Loss Functions , 2003, NIPS.
[40] Cem Anil,et al. Sorting out Lipschitz function approximation , 2018, ICML.
[41] F. Clarke. Optimization And Nonsmooth Analysis , 1983 .
[42] Andrew R. Barron,et al. Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls , 2016, IEEE Transactions on Information Theory.
[43] M. Coste. AN INTRODUCTION TO O-MINIMAL GEOMETRY , 2002 .
[44] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[45] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[46] Matus Telgarsky,et al. Margins, Shrinkage, and Boosting , 2013, ICML.
[47] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[48] J. Palis,et al. Geometric theory of dynamical systems : an introduction , 1984 .
[49] F. Clarke. Generalized gradients and applications , 1975 .
[50] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[51] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.
[52] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[53] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[54] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[55] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[56] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[57] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[58] G. Zoutendijk,et al. Mathematical Programming Methods , 1976 .
[59] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[60] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.
[61] Dmitriy Drusvyatskiy,et al. Curves of Descent , 2012, SIAM J. Control. Optim..
[62] Colin Wei,et al. Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin , 2020, ICLR.
[63] Lorenzo Rosasco,et al. Theory III: Dynamics and Generalization in Deep Networks , 2019, ArXiv.
[64] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[65] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).
[66] Dmitriy Drusvyatskiy,et al. Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.
[67] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[68] Nathan Srebro,et al. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.