When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples

Regularization aims to improve prediction performance by trading an increase in training error for better agreement between training and prediction errors, which is often captured through decreased degrees of freedom. In this paper we give examples which show that regularization can increase the degrees of freedom in common models, including the lasso and ridge regression. In such situations, both training error and degrees of freedom increase, making the regularization inherently without merit. Two important scenarios are described where the expected reduction in degrees of freedom is guaranteed: all symmetric linear smoothers and convex constrained linear regression models like ridge regression and the lasso, when compared to unconstrained linear regression.

[1]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[2]  C. L. Mallows Some comments on C_p , 1973 .

[3]  Kengo Kato,et al.  On the degrees of freedom in shrinkage estimation , 2009, J. Multivar. Anal..

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  G. Wahba Bayesian "Confidence Intervals" for the Cross-validated Smoothing Spline , 1983 .

[6]  Ori Davidov,et al.  Constrained estimation and the theorem of Kuhn-Tucker , 2006, Adv. Decis. Sci..

[7]  B. Efron The Estimation of Prediction Error , 2004 .

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[10]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[11]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[12]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[15]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[16]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[17]  Sudheesh K. Kattumannil,et al.  On Stein's identity and its applications , 2009 .

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[20]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[21]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[22]  Masashi Sugiyama,et al.  The Degrees of Freedom of Partial Least Squares Regression , 2010, 1002.4112.

[23]  H. Akaike A new look at the statistical model identification , 1974 .

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.