暂无分享,去创建一个
[1] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[2] Xiaohan Wei,et al. Structured Signal Recovery From Non-Linear and Heavy-Tailed Measurements , 2016, IEEE Transactions on Information Theory.
[3] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[4] Yue Zhang,et al. On the Consistency of Feature Selection With Lasso for Non-linear Targets , 2016, ICML.
[5] Sen Na,et al. High-dimensional Varying Index Coefficient Models via Stein's Identity , 2018, J. Mach. Learn. Res..
[6] Jianqing Fan,et al. Robust high dimensional factor models with applications to statistical machine learning. , 2018, Statistical science : a review journal of the Institute of Mathematical Statistics.
[7] Christos Thrampoulidis,et al. LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.
[8] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[9] Jianqing Fan,et al. Robust Covariance Estimation for Approximate Factor Models. , 2016, Journal of econometrics.
[10] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[11] Varun Kanade,et al. Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.
[12] Francis Bach,et al. Slice inverse regression with score functions , 2018 .
[13] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[14] Jun S. Liu,et al. Sparse Sliced Inverse Regression via Lasso , 2016, Journal of the American Statistical Association.
[15] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[16] Babak Hassibi,et al. Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[17] Nikolaos Doulamis,et al. Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..
[18] Zhaoran Wang,et al. A convex formulation for high‐dimensional sparse sliced inverse regression , 2018, ArXiv.
[19] Ker-Chau Li,et al. Regression Analysis Under Link Violation , 1989 .
[20] Yonina C. Eldar,et al. Phase Retrieval via Matrix Completion , 2011, SIAM Rev..
[21] Zhaoran Wang,et al. Agnostic Estimation for Misspecified Phase Retrieval Models , 2020, NIPS.
[22] Krishnakumar Balasubramanian,et al. Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein's Lemma , 2017, NIPS.
[23] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[24] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[25] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[26] A. Bandeira,et al. Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.
[27] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[28] P. Diaconis,et al. Use of exchangeable pairs in the analysis of simulations , 2004 .
[29] Tuo Zhao,et al. Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective , 2020, NeurIPS.
[30] Cong Ma,et al. A Selective Overview of Deep Learning , 2019, Statistical science : a review journal of the Institute of Mathematical Statistics.
[31] Yaniv Plan,et al. Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.
[32] Xiaohan Wei,et al. Estimation of the covariance structure of heavy-tailed distributions , 2017, NIPS.
[33] R. Cook,et al. Dimension Reduction in Binary Response Regression , 1999 .
[34] Edward A. Fox,et al. Natural Language Processing Advancements By Deep Learning: A Survey , 2020, ArXiv.
[35] Konstantinos Spiliopoulos,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[36] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[37] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[38] Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.
[39] Justin A. Sirignano,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[40] Krishnakumar Balasubramanian,et al. High-dimensional Non-Gaussian Single Index Models via Thresholded Score Function Estimation , 2017, ICML.
[41] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[42] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[43] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[44] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[45] Christos Thrampoulidis,et al. The Generalized Lasso for Sub-gaussian Observations with Dithered Quantization , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[46] Michael W. Mahoney,et al. Exact expressions for double descent and implicit regularization via surrogate random design , 2019, NeurIPS.
[47] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[48] W. Härdle,et al. Optimal Smoothing in Single-index Models , 1993 .
[49] V. Koltchinskii,et al. Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.
[50] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .
[51] Christos Thrampoulidis,et al. A Model of Double Descent for High-dimensional Binary Linear Classification , 2019, ArXiv.
[52] Pablo A. Parrilo,et al. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..
[53] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[54] R. Cook,et al. Principal Hessian Directions Revisited , 1998 .
[55] Ker-Chau Li,et al. Sliced Inverse Regression for Dimension Reduction , 1991 .
[56] H. Zou,et al. STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.
[57] R. Cook,et al. Sufficient Dimension Reduction via Inverse Regression , 2005 .
[58] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[59] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[60] P. Zhao,et al. Implicit regularization via hadamard product over-parametrization in high-dimensional linear regression , 2019 .
[61] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[62] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[63] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[64] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[65] Martin Genzel,et al. High-Dimensional Estimation of Structured Signals From Non-Linear Observations With General Convex Loss Functions , 2016, IEEE Transactions on Information Theory.
[66] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[67] Andrea Montanari,et al. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.
[68] Yuxin Chen,et al. Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..
[69] Stanislav Minsker,et al. Robust modifications of U-statistics and applications to covariance estimation problems , 2018, Bernoulli.
[70] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[71] Christos Thrampoulidis,et al. Analytic Study of Double Descent in Binary Classification: The Impact of Loss , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).
[72] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[73] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[74] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .
[75] Jugal K. Kalita,et al. A Survey of the Usages of Deep Learning for Natural Language Processing , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[76] Martin J. Wainwright,et al. Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.
[77] G. Lugosi,et al. Empirical risk minimization for heavy-tailed losses , 2014, 1406.2462.
[78] Samet Oymak,et al. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , 2018, ICML.
[79] Y. Plan,et al. High-dimensional estimation with geometric constraints , 2014, 1404.3749.
[80] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[81] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[82] Y. Xia,et al. A Multiple-Index Model and Dimension Reduction , 2008 .
[83] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[84] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[85] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[86] Xiaohan Wei,et al. Non-Gaussian Observations in Nonlinear Compressed Sensing via Stein Discrepancies , 2016, 1609.08512.
[87] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[88] Peter D. Hoff,et al. Lasso, fractional norm and structured sparse estimation using a Hadamard product parametrization , 2016, Comput. Stat. Data Anal..
[89] Yaniv Plan,et al. The Generalized Lasso With Non-Linear Observations , 2015, IEEE Transactions on Information Theory.
[90] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[91] Qiang Sun,et al. User-Friendly Covariance Estimation for Heavy-Tailed Distributions , 2018, Statistical Science.
[92] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.
[93] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[94] R. Dennis Cook,et al. Sparse Minimum Discrepancy Approach to Sufficient Dimension Reduction with Simultaneous Variable Selection in Ultrahigh Dimension , 2018, Journal of the American Statistical Association.
[95] D. Brillinger. A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .
[96] P. McCullagh,et al. Generalized Linear Models , 1992 .
[97] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[98] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[99] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[100] A. Tsybakov,et al. Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.
[101] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .
[102] Ruslan Salakhutdinov,et al. Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.
[103] Ziwei Zhu,et al. Taming heavy-tailed features by shrinkage , 2021, AISTATS.
[104] Yaniv Plan,et al. One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.
[105] Xiaohan Wei,et al. Structured Recovery with Heavy-tailed Measurements: A Thresholding Procedure and Optimal Rates , 2018, 1804.05959.
[106] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[107] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[108] Weichen Wang,et al. A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY. , 2016, Annals of statistics.
[109] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[110] Jianqing Fan,et al. Generalized Partially Linear Single-Index Models , 1997 .
[111] J. Horowitz. Semiparametric and Nonparametric Methods in Econometrics , 2007 .
[112] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[113] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[114] E. Candès. The restricted isometry property and its implications for compressed sensing , 2008 .
[115] Yingcun Xia,et al. On extended partially linear single-index models , 1999 .
[116] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[117] Tianxi Cai,et al. L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs , 2015, J. Mach. Learn. Res..
[118] Laurent Jacques,et al. Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.
[119] Lin F. Yang,et al. Misspecified nonconvex statistical optimization for sparse phase retrieval , 2019, Mathematical Programming.
[120] Jun S. Liu,et al. On consistency and sparsity for sliced inverse regression in high dimensions , 2015, 1507.03895.
[121] Jianqing Fan,et al. LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS. , 2015, Annals of statistics.
[122] Matus Telgarsky,et al. A refined primal-dual analysis of the implicit bias , 2019, ArXiv.
[123] E Weinan,et al. On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models , 2019, ArXiv.
[124] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[125] Jason D. Lee,et al. Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks , 2019, ICLR.
[126] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[127] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[128] Aaron K. Han. Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator , 1987 .
[129] Ker-Chau Li,et al. Slicing Regression: A Link-Free Regression Method , 1991 .
[130] Ker-Chau Li,et al. On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .
[131] H. Bernhard Schlegel,et al. Geometry optimization , 2011 .
[132] Krishnakumar Balasubramanian,et al. Tensor Methods for Additive Index Models under Discordance and Heterogeneity , 2018, 1807.06693.
[133] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[134] Bo Jiang,et al. Variable selection for general index models via sliced inverse regression , 2013, 1304.4056.
[135] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[136] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[137] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[138] Razvan Pascanu,et al. Local minima in training of neural networks , 2016, 1611.06310.
[139] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[140] Ziwei Zhu. Taming the heavy-tailed features by shrinkage and clipping , 2017, 1710.09020.
[141] Stanislav Minsker. Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries , 2016, The Annals of Statistics.
[142] Francis Bach,et al. Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks , 2019, NeurIPS.