Harmless interpolation of noisy data in regression
暂无分享,去创建一个
Anant Sahai | Vidya Muthukumar | Kailas Vodrahalli | A. Sahai | Kailas Vodrahalli | Vidya Muthukumar
[1] Lie Wang,et al. Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.
[2] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[3] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.
[4] Nathan Srebro,et al. Kernel and Rich Regimes in Overparametrized Models , 2019, COLT.
[5] Babak Hassibi,et al. Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[6] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[7] Meir Feder,et al. A New Look at an Old Problem: A Universal Learning Approach to Linear Regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[8] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[9] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[10] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.
[11] David Mease,et al. Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..
[12] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[13] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[14] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[15] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[16] Levent Sagun,et al. The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.
[17] F. T. Wright,et al. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables , 1971 .
[18] H. Rauhut. Compressive Sensing and Structured Random Matrices , 2009 .
[19] Martin J. Wainwright,et al. Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.
[20] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..
[21] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.
[22] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[23] Mikhail Belkin,et al. Does data interpolation contradict statistical optimality? , 2018, AISTATS.
[24] Sara A. van de Geer,et al. On Tight Bounds for the Lasso , 2018, J. Mach. Learn. Res..
[25] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[26] M. Rudelson,et al. The Littlewood-Offord problem and invertibility of random matrices , 2007, math/0703503.
[27] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[28] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[29] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[30] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[31] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[32] Nathan Srebro,et al. Kernel and Deep Regimes in Overparametrized Models , 2019, ArXiv.
[33] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.
[34] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[35] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[36] Alexander Rakhlin,et al. Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.
[37] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[38] Martin J. Wainwright,et al. Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.
[39] Martin J. Wainwright,et al. Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..
[40] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[41] Anastasios Kyrillidis,et al. Minimum norm solutions do not always generalize well for over-parameterized problems , 2018, ArXiv.
[42] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[43] Joel A. Tropp,et al. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.
[44] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[45] Elisabeth Gassiat,et al. Adaptive estimation of high-dimensional signal-to-noise ratios , 2016, Bernoulli.
[46] Chi-Kwong Li,et al. Isometries of ℓp-norm , 1994 .
[47] Holger Rauhut,et al. Compressive Sensing with structured random matrices , 2012 .
[48] Partha P Mitra,et al. Understanding overfitting peaks in generalization error: Analytical risk curves for l2 and l1 penalized interpolation , 2019, ArXiv.
[49] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[50] Sundeep Rangan,et al. Necessary and Sufficient Conditions for Sparsity Pattern Recovery , 2008, IEEE Transactions on Information Theory.
[51] A. Atkinson. Subset Selection in Regression , 1992 .
[52] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[53] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[54] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[55] Venkatesh Saligrama,et al. Thresholded Basis Pursuit: LP Algorithm for Order-Wise Optimal Support Recovery for Sparse and Approximately Sparse Signals From Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.
[56] A. Tsybakov,et al. Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.
[57] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[58] Jean-Luc Starck,et al. Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit , 2012, IEEE Transactions on Information Theory.
[59] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.
[60] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[61] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[62] Stanislaw J. Szarek,et al. Condition numbers of random matrices , 1991, J. Complex..
[63] Venkatesh Saligrama,et al. Information Theoretic Bounds for Compressed Sensing , 2008, IEEE Transactions on Information Theory.
[64] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[65] A. Belloni,et al. Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.
[66] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[67] A. Edelman. Eigenvalues and condition numbers of random matrices , 1988 .
[68] E. Candès,et al. Controlling the false discovery rate via knockoffs , 2014, 1404.5609.
[69] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[70] S. Szarek,et al. Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .