论文信息 - Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning

Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning

Studying the generalization abilities of linear models with real data is a central question in statistical learning. While there exist a limited number of prior important works (Loureiro et al. (2021A, 2021B), Wei et al. 2022) that do validate theoretical work with real data, these works have limitations due to technical assumptions. These assumptions include having a well-conditioned covariance matrix and having independent and identically distributed data. These assumptions are not necessarily valid for real data. Additionally, prior works that do address distributional shifts usually make technical assumptions on the joint distribution of the train and test data (Tripuraneni et al. 2021, Wu and Xu 2020), and do not test on real data. In an attempt to address these issues and better model real data, we look at data that is not I.I.D. but has a low-rank structure. Further, we address distributional shift by decoupling assumptions on the training and test distribution. We provide analytical formulas for the generalization error of the denoising problem that are asymptotically exact. These are used to derive theoretical results for linear regression, data augmentation, principal component regression, and transfer learning. We validate all of our theoretical results on real data and have a low relative mean squared error of around 1% between the empirical risk and our estimated risk.

Rishi Sonthalia | Chinmaya Kausik | Kashvi Srivastava

[1] Bochao Gu,et al. Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line , 2023, ArXiv.

[2] Reinhard Heckel,et al. Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization , 2022, ArXiv.

[3] Jeffrey Pennington,et al. Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions , 2022, NeurIPS.

[4] Reinhard Heckel,et al. Regularization-wise double descent: Why it occurs and how to eliminate it , 2022, 2022 IEEE International Symposium on Information Theory (ISIT).

[5] Jeffrey Pennington,et al. Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties , 2022, 2205.07069.

[6] J. Steinhardt,et al. More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize , 2022, ICML.

[7] O. Shamir,et al. The Implicit Bias of Benign Overfitting , 2022, COLT.

[8] Guido Montufar,et al. Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks , 2022, ICLR.

[9] Jeffrey Pennington,et al. Covariate Shift in High-Dimensional Random Feature Regression , 2021, ArXiv.

[10] Yair Carmon,et al. Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization , 2021, ICML.

[11] Qi Lei,et al. Near-Optimal Linear Regression under Distribution Shift , 2021, ICML.