Understanding Implicit Regularization in Over-Parameterized Single Index Model
暂无分享,去创建一个
[1] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .
[2] Aaron K. Han. Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator , 1987 .
[3] Ker-Chau Li,et al. Regression Analysis Under Link Violation , 1989 .
[4] Ker-Chau Li,et al. Slicing Regression: A Link-Free Regression Method , 1991 .
[5] Ker-Chau Li,et al. Sliced Inverse Regression for Dimension Reduction , 1991 .
[6] Ker-Chau Li,et al. On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .
[7] W. Härdle,et al. Optimal Smoothing in Single-index Models , 1993 .
[8] Jianqing Fan,et al. Generalized Partially Linear Single-Index Models , 1997 .
[9] R. Cook,et al. Principal Hessian Directions Revisited , 1998 .
[10] R. Cook,et al. Dimension Reduction in Binary Response Regression , 1999 .
[11] Yingcun Xia,et al. On extended partially linear single-index models , 1999 .
[12] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .
[13] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[14] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[15] P. Diaconis,et al. Use of exchangeable pairs in the analysis of simulations , 2004 .
[16] R. Cook,et al. Sufficient Dimension Reduction via Inverse Regression , 2005 .
[17] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[18] J. Horowitz. Semiparametric and Nonparametric Methods in Econometrics , 2007 .
[19] E. Candès. The restricted isometry property and its implications for compressed sensing , 2008 .
[20] Y. Xia,et al. A Multiple-Index Model and Dimension Reduction , 2008 .
[21] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[22] Pablo A. Parrilo,et al. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..
[23] Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.
[24] V. Koltchinskii,et al. Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.
[25] A. Tsybakov,et al. Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.
[26] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.
[27] Martin J. Wainwright,et al. Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.
[28] H. Bernhard Schlegel,et al. Geometry optimization , 2011 .
[29] Yaniv Plan,et al. One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.
[30] D. Brillinger. A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .
[31] Yaniv Plan,et al. Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.
[32] Jonathan E. Taylor,et al. On model selection consistency of regularized M-estimators , 2013, ArXiv.
[33] Laurent Jacques,et al. Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.
[34] H. Zou,et al. STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.
[35] Y. Plan,et al. High-dimensional estimation with geometric constraints , 2014, 1404.3749.
[36] Bo Jiang,et al. Variable selection for general index models via sliced inverse regression , 2013, 1304.4056.
[37] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[38] A. Bandeira,et al. Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.
[39] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[40] Jun S. Liu,et al. On consistency and sparsity for sliced inverse regression in high dimensions , 2015, 1507.03895.
[41] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[42] Yonina C. Eldar,et al. Phase Retrieval with Application to Optical Imaging: A contemporary overview , 2015, IEEE Signal Processing Magazine.
[43] G. Lugosi,et al. Empirical risk minimization for heavy-tailed losses , 2014, 1406.2462.
[44] S. Frick,et al. Compressed Sensing , 2014, Computer Vision, A Reference Guide.
[45] Qing Ling,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.
[46] Christos Thrampoulidis,et al. LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.
[47] Yonina C. Eldar,et al. Phase Retrieval via Matrix Completion , 2011, SIAM Rev..
[48] Zhaoran Wang,et al. Agnostic Estimation for Misspecified Phase Retrieval Models , 2020, NIPS.
[49] Razvan Pascanu,et al. Local minima in training of neural networks , 2016, 1611.06310.
[50] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[51] Yue Zhang,et al. On the Consistency of Feature Selection With Lasso for Non-linear Targets , 2016, ICML.
[52] Yaniv Plan,et al. The Generalized Lasso With Non-Linear Observations , 2015, IEEE Transactions on Information Theory.
[53] Qing Ling,et al. On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..
[54] Tianxi Cai,et al. L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs , 2015, J. Mach. Learn. Res..
[55] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[56] Xiaohan Wei,et al. Non-Gaussian Observations in Nonlinear Compressed Sensing via Stein Discrepancies , 2016, 1609.08512.
[57] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[58] Qiang Liu,et al. Communication-efficient Sparse Regression , 2017, J. Mach. Learn. Res..
[59] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[60] Peter D. Hoff,et al. Lasso, fractional norm and structured sparse estimation using a Hadamard product parametrization , 2016, Comput. Stat. Data Anal..
[61] Krishnakumar Balasubramanian,et al. Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein's Lemma , 2017, NIPS.
[62] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[63] Ziwei Zhu. Taming the heavy-tailed features by shrinkage and clipping , 2017, 1710.09020.
[64] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[65] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[66] Xiaohan Wei,et al. Estimation of the covariance structure of heavy-tailed distributions , 2017, NIPS.
[67] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[68] Krishnakumar Balasubramanian,et al. High-dimensional Non-Gaussian Single Index Models via Thresholded Score Function Estimation , 2017, ICML.
[69] Martin Genzel,et al. High-Dimensional Estimation of Structured Signals From Non-Linear Observations With General Convex Loss Functions , 2016, IEEE Transactions on Information Theory.
[70] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[71] Jianqing Fan,et al. DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS. , 2018, Annals of statistics.
[72] R. Dennis Cook,et al. Sparse Minimum Discrepancy Approach to Sufficient Dimension Reduction with Simultaneous Variable Selection in Ultrahigh Dimension , 2018, Journal of the American Statistical Association.
[73] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[74] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[75] Yun Yang,et al. Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.
[76] Xiaohan Wei,et al. Structured Signal Recovery From Non-Linear and Heavy-Tailed Measurements , 2016, IEEE Transactions on Information Theory.
[77] Francis Bach,et al. Slice inverse regression with score functions , 2018 .
[78] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[79] Jianqing Fan,et al. LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS. , 2015, Annals of statistics.
[80] Stanislav Minsker,et al. Robust modifications of U-statistics and applications to covariance estimation problems , 2018, Bernoulli.
[81] Xiaohan Wei,et al. Structured Recovery with Heavy-tailed Measurements: A Thresholding Procedure and Optimal Rates , 2018, 1804.05959.
[82] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[83] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[84] Justin A. Sirignano,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[85] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[86] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[87] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[88] Christos Thrampoulidis,et al. The Generalized Lasso for Sub-gaussian Observations with Dithered Quantization , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[89] Zhaoran Wang,et al. A convex formulation for high‐dimensional sparse sliced inverse regression , 2018, ArXiv.
[90] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[91] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[92] Nikolaos Doulamis,et al. Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..
[93] Krishnakumar Balasubramanian,et al. Tensor Methods for Additive Index Models under Discordance and Heterogeneity , 2018, 1807.06693.
[94] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[95] Stanislav Minsker. Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries , 2016, The Annals of Statistics.
[96] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[97] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[98] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[99] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[100] Babak Hassibi,et al. Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[101] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .
[102] Sen Na,et al. High-dimensional Varying Index Coefficient Models via Stein's Identity , 2018, J. Mach. Learn. Res..
[103] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[104] Qiang Sun,et al. User-Friendly Covariance Estimation for Heavy-Tailed Distributions , 2018, Statistical Science.
[105] Lin F. Yang,et al. Misspecified nonconvex statistical optimization for sparse phase retrieval , 2019, Mathematical Programming.
[106] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[107] Jun S. Liu,et al. Sparse Sliced Inverse Regression via Lasso , 2016, Journal of the American Statistical Association.
[108] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[109] Jianqing Fan,et al. Communication-Efficient Accurate Statistical Estimation , 2019, Journal of the American Statistical Association.
[110] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[111] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[112] Christos Thrampoulidis,et al. A Model of Double Descent for High-dimensional Binary Linear Classification , 2019, Information and Inference: A Journal of the IMA.
[113] Jianqing Fan,et al. Robust Covariance Estimation for Approximate Factor Models. , 2016, Journal of econometrics.
[114] Matus Telgarsky,et al. A refined primal-dual analysis of the implicit bias , 2019, ArXiv.
[115] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[116] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[117] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[118] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[119] E Weinan,et al. On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models , 2019, ArXiv.
[120] P. Zhao,et al. Implicit regularization via hadamard product over-parametrization in high-dimensional linear regression , 2019 .
[121] Suvrit Sra,et al. Small nonlinearities in activation functions create bad local minima in neural networks , 2018, ICLR.
[122] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[123] Samet Oymak,et al. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , 2018, ICML.
[124] Francis Bach,et al. Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks , 2019, NeurIPS.
[125] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[126] Varun Kanade,et al. Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.
[127] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[128] Andrea Montanari,et al. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.
[129] Jason D. Lee,et al. Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks , 2019, ICLR.
[130] Yuxin Chen,et al. Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..
[131] Tuo Zhao,et al. Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective , 2020, NeurIPS.
[132] L. Rosasco,et al. Decentralised Learning with Distributed Gradient Descent and Random Features , 2020 .
[133] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[134] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[135] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[136] Rouzbeh A. Shirvani,et al. Natural Language Processing Advancements By Deep Learning: A Survey , 2020, ArXiv.
[137] Christos Thrampoulidis,et al. Analytic Study of Double Descent in Binary Classification: The Impact of Loss , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).
[138] Michael W. Mahoney,et al. Exact expressions for double descent and implicit regularization via surrogate random design , 2019, NeurIPS.
[139] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[140] Patrick Rebeschini,et al. Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent , 2018, J. Mach. Learn. Res..
[141] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[142] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[143] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[144] Kaifeng Lyu,et al. Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning , 2020, ICLR.
[145] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[146] Weichen Wang,et al. A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY. , 2016, Annals of statistics.
[147] Cong Ma,et al. A Selective Overview of Deep Learning , 2019, Statistical science : a review journal of the Institute of Mathematical Statistics.
[148] Jugal K. Kalita,et al. A Survey of the Usages of Deep Learning for Natural Language Processing , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[149] Jianqing Fan,et al. Robust high dimensional factor models with applications to statistical machine learning. , 2018, Statistical science : a review journal of the Institute of Mathematical Statistics.
[150] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.