Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning
暂无分享,去创建一个
[1] Bochao Gu,et al. Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line , 2023, ArXiv.
[2] Reinhard Heckel,et al. Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization , 2022, ArXiv.
[3] Jeffrey Pennington,et al. Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions , 2022, NeurIPS.
[4] Reinhard Heckel,et al. Regularization-wise double descent: Why it occurs and how to eliminate it , 2022, 2022 IEEE International Symposium on Information Theory (ISIT).
[5] Jeffrey Pennington,et al. Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties , 2022, 2205.07069.
[6] J. Steinhardt,et al. More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize , 2022, ICML.
[7] O. Shamir,et al. The Implicit Bias of Benign Overfitting , 2022, COLT.
[8] Guido Montufar,et al. Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks , 2022, ICLR.
[9] Jeffrey Pennington,et al. Covariate Shift in High-Dimensional Random Feature Regression , 2021, ArXiv.
[10] Yair Carmon,et al. Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization , 2021, ICML.
[11] Qi Lei,et al. Near-Optimal Linear Regression under Distribution Shift , 2021, ICML.
[12] F. Krzakala,et al. Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions , 2021, NeurIPS.
[13] Florent Krzakala,et al. Learning curves of generic features maps for realistic datasets with a teacher-student model , 2021, NeurIPS.
[14] Mohammad Zalbagi Darestani,et al. Measuring Robustness in Deep Learning Based Compressive Sensing , 2021, ICML.
[15] Andrea Montanari,et al. Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration , 2021, Applied and Computational Harmonic Analysis.
[16] D. Hogg,et al. Dimensionality reduction, regularization, and generalization in overparameterized regressions , 2020, SIAM Journal on Mathematics of Data Science.
[17] Andrea Montanari,et al. When do neural networks outperform kernel methods? , 2020, NeurIPS.
[18] Ji Xu,et al. On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression , 2020, NeurIPS.
[19] Levent Sagun,et al. Triple descent and the two kinds of overfitting: where and why do they appear? , 2020, NeurIPS.
[20] Benjamin Recht,et al. The Effect of Natural Distribution Shift on Question Answering Models , 2020, ICML.
[21] Edgar Dobriban,et al. The Implicit Regularization of Stochastic Gradient Flow for Least Squares , 2020, ICML.
[22] Tengyu Ma,et al. Optimal Regularization Can Mitigate Double Descent , 2020, ICLR.
[23] Florent Krzakala,et al. Generalisation error in learning with random features and the hidden manifold model , 2020, ICML.
[24] Arthur Jacot,et al. Implicit Regularization of Random Feature Models , 2020, ICML.
[25] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[26] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[27] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[28] Michael W. Mahoney,et al. Exact expressions for double descent and implicit regularization via surrogate random design , 2019, NeurIPS.
[29] R. Thomas McCoy,et al. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2019, BLACKBOXNLP.
[30] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[31] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[32] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[33] Andrea Montanari,et al. Limitations of Lazy Training of Two-layers Neural Networks , 2019, NeurIPS.
[34] Nathan Srebro,et al. Kernel and Rich Regimes in Overparametrized Models , 2019, COLT.
[35] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[36] Lucas Benigni,et al. Eigenvalue distribution of some nonlinear models of random matrices , 2019, Electronic Journal of Probability.
[37] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[38] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[39] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[40] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[41] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[42] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[43] J. Zico Kolter,et al. A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.
[44] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[45] Surya Ganguli,et al. An analytic theory of generalization dynamics and transfer learning in deep linear networks , 2018, ICLR.
[46] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[47] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[48] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[49] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[50] Madeleine Udell,et al. Why Are Big Data Matrices Approximately Low Rank? , 2017, SIAM J. Math. Data Sci..
[51] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[52] Stefan Wager,et al. High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.
[53] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.
[54] F. Gotze,et al. On the Rate of Convergence to the Marchenko--Pastur Distribution , 2011, 1110.1284.
[55] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[56] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.
[57] Friedrich Götze,et al. The rate of convergence for spectra of GUE and LUE matrix ensembles , 2005 .
[58] F. Götze,et al. Rate of convergence in probability to the Marchenko-Pastur law , 2004 .
[59] Friedrich Götze,et al. Rate of convergence to the semi-circular law , 2003 .
[60] Yimin Wei,et al. The weighted Moore-Penrose inverse of modified matrices , 2001, Appl. Math. Comput..
[61] A. Goldberger,et al. On the Exact Covariance of Products of Random Variables , 1969 .
[62] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[63] R. Nadakuditi,et al. Training Data Size Induced Double Descent For Denoising Feedforward Neural Networks and the Role of Training Noise , 2023, Trans. Mach. Learn. Res..
[64] Yunwen Lei,et al. University of Birmingham Generalization performance of multi-pass stochastic gradient descent with convex loss functions , 2021 .
[65] Nilesh Tripuraneni,et al. Overparameterization Improves Robustness to Covariate Shift in High Dimensions , 2021, NeurIPS.
[66] Surya Ganguli,et al. A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a hierarchy of phase transitions , 2021, ICML.
[67] Ji Xu,et al. On the number of variables to use in principal component regression , 2019 .
[68] S. Péché,et al. A note on the Pennington-Worah distribution , 2019, Electronic Communications in Probability.
[69] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[70] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[71] Michael Elad,et al. Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.
[72] Jian-Feng Yao,et al. Convergence Rates of Spectral Distributions of Large Sample Covariance Matrices , 2003, SIAM J. Matrix Anal. Appl..
[73] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .